[FFmpeg-devel] Possible crasher bug when decoding unreliable H264 data

Fri Jun 21 15:36:14 CEST 2013

On 6/21/2013 6:20 AM, Michael Niedermayer wrote:
> On Fri, Jun 21, 2013 at 04:45:40AM -0700, Mark Stevans wrote:
>> On 6/21/2013 3:24 AM, Michael Niedermayer wrote:
>>> On Fri, Jun 21, 2013 at 02:21:24AM -0700, Mark Stevans wrote:
>
> [...]
>
>> Am I interpreting the actions of this function correctly?  I am not
>> the greatest C programmer in the world, so I could be reading the
>> code wrong.
>
> why dont you add av_log/printf and actually look at what actual
> values the variables have when the crash happens.
> Also you can try -threads 1 to see if its threading related
>
> and dumping the stream to disk and ten cuting the problematic part
> out may give a testcase thats easier to work with than waiting hours
> for it to occur

I already have additional logging turned on, which is why I'm fairly 
certain of what is going wrong.

Are you satisfied with the state of "ff_h264_check_intra_pred_mode"? 
With the missing array elements defaulting to zero, which is interpreted 
as "DC_PRED8x8"?  It seems unusual to me, but then again, I am new here....

>> And I now have a stack trace (only took two hours to repro the bug):
>>
>> ffplay_g!ff_pred8x8_dc_8_mmxext+0x6
>> ffplay_g!hl_decode_mb_simple_8(struct H264Context * h =
>> 0x00000000`02803cc0)+0xd12
>> ffplay_g!ff_h264_hl_decode_mb(struct H264Context * h =
>> 0x00000000`02803cc0)+0xfe
>> ffplay_g!decode_slice(struct AVCodecContext * avctx =
>> 0x00000000`01fede40, void * arg = 0x00000000`03cdfa10)+0x69b
>> ffplay_g!execute_decode_slices(struct H264Context * h =
>> 0x00000000`02803cc0, int context_count = 0n1)+0x76
>> ffplay_g!decode_nal_units(struct H264Context * h =
>> 0x00000000`02803cc0, unsigned char * buf = 0x00000000`02e09460 "",
>> int buf_size = 0n2740, int parse_extradata = 0n0)+0x1219
>> ffplay_g!decode_frame(struct AVCodecContext * avctx =
>> 0x00000000`01fede40, void * data = 0x00000000`023f0a50, int *
>> got_frame = 0x00000000`023f0cb8, struct AVPacket * avpkt =
>> 0x00000000`023f09e0)+0x4bb
>> ffplay_g!frame_worker_thread(void * arg = 0x00000000`023f0940)+0x158
>> ffplay_g!win32thread_worker(void * arg = 0x00000000`023f0948)+0x39
>> MSVCR100!endthreadex+0x43
>> MSVCR100!endthreadex+0xdf
>> kernel32!BaseThreadInitThunk+0xd
>> ntdll!RtlUserThreadStart+0x1d

Yes, WinDbg isn't outputting line numbers, sorry -- I should have added 
those manually.  But then again, the line numbers wouldn't help much, 
because I've modified most of the relevant files.

Line numbers or not, the only useful part of the stack trace is the top 
line, which shows that the access violation occurred because of the 
dereferencing of the (non-existent) previous data row.  I did look at 
the registers/variables in WinDbg, and the first frame data row happened 
to be just after a memory page boundary (a multiple of something like 
0x1000): the faulting memory reference was just under that boundary, 
something like 0x...fea8).  Do you disagree with my theory?

There is one thing I am doing that I didn't mention: I am opening and 
closing the stream every 30 seconds for hours at a time, which does tend 
to mess up FFPlay a lot more than usual.  This is just stress testing, 
very helpful of course in revealing unknown bugs like this one, assuming 
it does exist....

MLS