[FFmpeg-devel] [PATCH 2/2] avformat/mp3dec: Check for occurances of headers within frames during probing

Michael Niedermayer michael at niedermayer.cc
Sat Nov 9 00:27:50 EET 2019


On Thu, Nov 07, 2019 at 10:37:05PM +0100, Hendrik Leppkes wrote:
> On Thu, Nov 7, 2019 at 10:34 PM Michael Niedermayer
> <michael at niedermayer.cc> wrote:
> >
> > From: Limin Wang <lance.lmwang at gmail.com>
> >
> > Fixes misdetection of zYLx.wav
> >
> > Co-Author: Michael Niedermayer <michael at niedermayer.cc>
> > Signed-off-by: Michael Niedermayer <michael at niedermayer.cc>
> > ---
> >  libavformat/mp3dec.c | 10 +++++++++-
> >  1 file changed, 9 insertions(+), 1 deletion(-)
> >
> > diff --git a/libavformat/mp3dec.c b/libavformat/mp3dec.c
> > index 6848415657..eb40362548 100644
> > --- a/libavformat/mp3dec.c
> > +++ b/libavformat/mp3dec.c
> > @@ -73,7 +73,7 @@ static int mp3_read_probe(const AVProbeData *p)
> >      int frames, ret;
> >      int framesizes, max_framesizes;
> >      uint32_t header;
> > -    const uint8_t *buf, *buf0, *buf2, *end;
> > +    const uint8_t *buf, *buf0, *buf2, *buf3, *end;
> >
> >      buf0 = p->buf;
> >      end = p->buf + p->buf_size - sizeof(uint32_t);
> > @@ -88,11 +88,19 @@ static int mp3_read_probe(const AVProbeData *p)
> >          buf2 = buf;
> >          for(framesizes = frames = 0; buf2 < end; frames++) {
> >              MPADecodeHeader h;
> > +            int header_emu = 0;
> >
> >              header = AV_RB32(buf2);
> >              ret = avpriv_mpegaudio_decode_header(&h, header);
> >              if (ret != 0 || end - buf2 < h.frame_size)
> >                  break;
> > +
> > +            for (buf3 = buf2 + 4; buf3 < buf2 + h.frame_size; buf3++) {
> > +                uint32_t next_sync = AV_RB32(buf3);
> > +                header_emu += (next_sync & MP3_MASK) == (header & MP3_MASK);
> > +            }
> > +            if (header_emu > 2)
> > +                break;
> >              buf2 += h.frame_size;
> >              framesizes += h.frame_size;
> >          }
> 
> I still have the same question - how possible is it that the actual
> audio data actually has these same bits? Is it actually impossible? Or
> are we just trading detection flaws here?

mp3 does not prevent header emulation unless my memory of the spec is
flawed. So it is possible that valid looking mp3 headers occur inside
valid mp3 frames.
This is thus a probabilistic test. 
Consider A1
    A stream that has few valid mp3 headers that in addition are spaced 
    correctly for mp3 will likely be mp3 or some mp3 in some othetr container
    but its very unlikely to be PCM
Consider A2
    A stream that has few valid mp3 headers that are not spaced 
    correctly for mp3 is unlikely to be raw mp3, whatever that may be
Consider B
    A stream that has many valid mp3 headers, so that there are maybe
    100 headers within each frame.
    
For the last (B) case having headers match at the location of frame sizes
is a significantly weaker hint toward valid mp3 because there are many
its not unlikely to match something, so the previous seperation we
had between A1 and A2 is no longer possibly at B and thus something
should reduce the score for mp3 in the B case

The patch does that by considering frames with 3 or more headers
inside to be not matching the frame size of the previous.

We use an == check instead of a avpriv_mpegaudio_decode_header() call
because its simpler and faster and works for the case we have

There are many possible ways to achieve similar effects, for example
we could reduce the score based on the "density" of headers within
frames, this was just the simplest i saw and was already mostly implemented
by Limin. But surely it can be done differently

Also last thought, probing is fundamentally a probabilistic thing

A more radically different approuch would be to test for more of the mp3
syntax after the header but i suspect that would be much more complex
especially considering that we need to do this for mp1 and mp2 too if
its supposed to help for such cases

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

The real ebay dictionary, page 2
"100% positive feedback" - "All either got their money back or didnt complain"
"Best seller ever, very honest" - "Seller refunded buyer after failed scam"
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: not available
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20191108/cc85a5e3/attachment.sig>


More information about the ffmpeg-devel mailing list