[FFmpeg-devel] [PATCH] fix parsing of broken mp3 streams

Wed Apr 22 10:50:10 CEST 2009

2009/4/22 Michael Niedermayer <michaelni at gmx.at>:
> On Tue, Apr 21, 2009 at 02:31:59PM +0200, Zdenek Kabelac wrote:
>> 2009/4/21 Michael Niedermayer <michaelni at gmx.at>:
>> > On Tue, Apr 21, 2009 at 11:14:16AM +0200, Zdenek Kabelac wrote:
>> >> 2009/4/21 Michael Niedermayer <michaelni at gmx.at>:
>> >> > On Tue, Apr 21, 2009 at 01:01:04AM +0200, Zdenek Kabelac wrote:
>> >> >> 2009/4/20 Michael Niedermayer <michaelni at gmx.at>:
>> >> >> > On Mon, Apr 20, 2009 at 09:37:25PM +0200, Zdenek Kabelac wrote:
>> >> >> >> 2009/4/19 Michael Niedermayer <michaelni at gmx.at>:
>> >> >> >> > On Sun, Apr 19, 2009 at 11:18:06PM +0200, Zdenek Kabelac wrote:
>> >> >> >> >> Hi
>> >> >> >> >>
>> >> >> >> >> Here is a small patch that fixes of running out-of-buffer in parsing
>> >> >> >> >> broken mp3 data stream.
>> >> >> >> >> This solution is rather a hotfix - better solution would be to check
>> >> >> >> >> at least one or two next mp3
>> >> >> >> >> frames in sequence whether they are part of the same audio stream or
>> >> >> >> >> some random junk
>> >> >> >> >> which has 0xfffx header inside. With this patch ugly noise could be
>> >> >> >> >> sometimes noticed.
>> >> >> >> >>
>> >> >> >> >> Also questionable is whether it should return -1 if no header is found
>> >> >> >> >> or rather return skipped
>> >> >> >> >> bytes and out_size = 0 - as then usually such packet is rescaned
>> >> >> >> >> multiple times with
>> >> >> >> >> one-byte step forward...
>> >> >> >> >>
>> >> >> >> >> Zdenek
>> >> >> >> >>
>> >> >> >> >> - Fix buffer overrun
>> >> >> >> >> - Properly return parsed bytes together with skipped bytes
>> >> >> >> >
>> >> >> >> > please provide a sample so we can confirm the bugfix, the patch
>> >> >> >> > looks mostly correct though
>> >> >> >> >
>> >> >> >>
>> >> >> >> I've upload just one mp3 dumped stream upload.ffmpeg.org as
>> >> >> >> junk_at_mp3stream ?directory - together with short text and two patch
>> >> >> >
>> >> >> >> - I'm attaching patch for api-example.c ?to easily compare results.
>> >> >> >
>> >> >> > i dont care what a modified tool does
>> >> >> > is there a problem that is reproduceable with ffmpeg or ffplay that
>> >> >> > your patch fixes?
>> >> >>
>> >> >> Patch is fixing mp3 decoder to skip only broken junk inside passed
>> >> >> data ?while decoding as much mp3 frames as possible and avoid buffer
>> >> >> over reading - don't ask me which tools are muxing avi streams with
>> >> >> junk in packets - obviously it some kind of re-synchronization from
>> >> >> splinting huge avi streams into small chunks....
>> >> >>
>> >> >> You could check for your self is to compare the result of extracted
>> >> >> wav size via api-example and then do
>> >> >> the same with ffmpeg -i junk.mp3 ?o.wav - you might observe small
>> >> >> difference 4027436 != 4018220
>> >> >> To do my homework and complete the list: mplayer -ao pcm:file=wav
>> >> >> junk.mp3 - creates 4022830 - but IMHO it decodes some broken packets
>> >> >> at the begining)
>> >> >>
>> >> >> (btw the patch for api-example should be probably commited into svn as well...)
>> >> >> Usually such badly muxed sample streams are way to small to notice
>> >> >> significant desynchronization.
>> >> >
>> >> > your original patch looked fine but after that you just talk nonsense
>> >> > apiexample is a example for codecs, not containers, mp3 must be passed
>> >> > through a demuxer and parser.
>> >>
>> >> I knew it would be hard - anyway I'll try once again - please check my
>> >> original patch
>> >> and see the mpegaudiodec.c code then please answer me following question:
>> >>
>> >> - What will stop parser from checking given buffer for mp3 header tag
>> >> after the buffer size
>> >> ?i.e. pass there zero memory area ?- I think decoder shouldn't run
>> >> behind the given buffer
>> >> even in the case it contains obviously wrong data - i.e. non-mp3 in this case.
>> >> (user would have to put false mp3 header after the passed buffer to
>> >> stop the parser)
>> >>
>> >> - If the mp3 packet is found within some offset from the beginning why
>> >> it should return
>> >> the size of parsed packed without the skipped bytes from the start of buffer.
>> >> (so next parsing will again start in the middle of previous mp3 packet)
>> >>
>> >> - Explain how the libavformat/mp3.c:mp3_read_packet() solve the problem?
>> >> (speaking of MP3_PACKET_SIZE - theoretical mythical max size of mp3
>> >> chunk is 1440)
>> >
>> > Iam not disputing that the original patch possibly fixes a issue, i
>> > am asking if you have a test case so we can test it.
>> >
>> > either
>> > A. the patch has no effect at all on ffmpeg & ffplay
>>
>> I think I've already shown that we could get a different amount of WAV
>> samples from particular mp3 audio stream - we might have a discussion
>> which number is correct - but IMHO ffmpeg tool ?should always try to
>> get as much as possible original samples from data stream - but I
>> could be alone...
>

At the beginning I want to state that I always admire your work - and
I'm really sorry it takes me so much time to explain the problem here,
but as I think I'm right I'd prefer to not give up until we will
properly understand each other and of course if I'm wrong, I want to
understand why...

> so does the stream you posted decode differently with unmodified ffmpeg
> vs. with the original patch?
> IIRC you only spoke of what the hacked apiexample does

I think I've expressed few times already that ffplay skips full audio
chunk (ffplay.c line ~1593) when it sees broken chunk i.e. mp3 chunk
is crossing frame boundary. In my api-example.c change is modification
which shows how I'm seeing the proper way of decoding of the byte mp3
stream stored in .avi chunk - when it finds error it rolls forward in
this stream and it could find a mp3 frame  which is currently lost by
plain full chunk skip. That is when the 'small' difference comes from.

api-example.c should be showing the API usage to the user of FFmpeg
library - and if avcodec_decode_audio3() is supposed to return number
of consumed bytes from buffer - it should work this way in the
api-example code - Somehow I do not fully understand what makes you so
crazy about this as IMHO my small patch just do it in the way the API
is supposed to work ?

The current logic mp3 decoder is actually doing also parsers' work to
match the beginning of mp3 chunk - and there is no mp3 parser for avi
chunks in use I think. (as the container is  .avi stream - not .mp3
stream)

So what I'm saying is - that mp3 is muxed into .avi as byte stream -
thus we should not easily give up full chunk if error is found - i.e.
there could be a person who will put all mp3 chunks into one .avi
frame - which is possible and perfectly valid and actually saves some
space in the .avi tables, but for some players the stream is hardly
seekable then - and the first error detected in mp3 stream will cause
lost of full audio track if frame_size gets -1.

As I do not want to extended my theories further - I wait until we
agree here on some conclusion.

There is number of solutions for this problem - I just wanted to go in
the way a smallest changes and I also said the fix in ffplay & ffmpeg
is not one-line change - but it's not so complicated either.

And one note to VBR I've mentioned previously - you probably
misunderstand that I mean some problems with reading Xing headers from
music audio mp3 files - I've only meant playing VBR audio mixed in AVI
streams (which I probably didn't emphasize enough) - which is
currently also a bit buggy - check i.e. 13fantavsync.avi in mplayer
samples site. There are other issues but let's fix them one-by-one.

>
>
>>
>> > B. there is a file for which behavior changes
>>
>> The fact that it's not running out of memory bounds when the mp3
>> header could not be found in the given buffer is probably because
>> usually lots of other mp3 frames are lying nearby in memory so it will
>> effective stop - and there are not too many heavily broken stream.
>
> maybe, maybe not
>
>
>>
>> So to answer your questions
>> A - currently my patch does not influence those tools as they discard
>> whole data chunk if the error is found.
>> B - artificial file could be probably created which will show problem
>> from scanning data past the buffer - and generate coredump - though
>> it's not probably so simple to ensure memory layout that no mp3 header
>> will not be found past the allocated header.
>
> so you claim 2 mutually exlusive theorems are true?
> sorry
> either ffmpeg does or does not change with some input

FFmpeg with my proposed patch doesn't change in the way, it produces
same amount of audio samples.
The only change is - it will not crash due to scanning past the input buffer

I think I cannot say it more clearly ?

>>
>> I assume ffmpeg is not leaving simple buffer scanning bugs inside just
>> because there is no real file in the world that shows generate a
>> segfault?
>
> if someone claims a patch fixes a bug but then fails to produce any hint
> of a bug without requireing some other patches that happen to contain numerous
> bugs themselfs well ...
>
> The question is not if i leave a bug in ffmpeg (you assert here there is a
> bug and your patch is correct)
>
> But rather if there is a bug and if the patch is the correct fix.

Correct fix is questionable - who and how should skip the broken bytes
in mp3 bytes stream - should the decoding routine return -1 when it
actually scanned nearly full buffer and found the beginning of the mp3
chunk just before the end of buffer - but as it did not fit into
buffer it will return -1?

I could propose it should return number of consumed bytes before the
mp3 header and zero output bytes - so decoder will get new buffer
properly moved on the next round or maybe decoder should return
immediately -1 if the first word is not mp3 header - and decoder is
supposed to move in buffer forward. Currently IMHO mp3 parser doesn't
make this work either - and also it would probably not help to .avi
container - AFAIK FFmpeg doesn't not support container in container
here - am I correct?  (not really following the code changes all the
time - so things could have changed, also timing of mp3 muxed in .avi
is different from native .mp3)

>
> If a patch author provides a testcase theres no doubt about the bug and if
> the patch author provides a consistent explanation of the issue and the
> fix then the patch is likely correct and doesnt require too much review
> if like this one it also looks correct

I think it actually requires you to look into the problem - so you
would understand what I'm trying to explain - it's probably impossible
to make the judgment just from the look at the patch.

If I'm fully wrong - explain in few sentences why and please try to
avoid using the word crap thanks

>
> If OTOH the patch author talks confused crap and submits other patches that
> are broken then id tend to want to take a deeper look at the patches that
> look correct on first sight from him. And that takes more time or a little
> help from the patch author (that is a reproduceable testcase)

I really hope we will at some point understand each others view on this issue.

Zdenek