[FFmpeg-devel] [PATCH 8/8] Make mime-type award a bonus probe score

Tomas Härdin git at haerdin.se
Thu Feb 13 23:29:33 EET 2025


tor 2025-02-13 klockan 13:03 +0100 skrev Michael Niedermayer:
> On Thu, Feb 13, 2025 at 12:40:24PM +0100, Tomas Härdin wrote:
> > ons 2025-02-12 klockan 23:03 +0100 skrev Michael Niedermayer:
> > > On Wed, Feb 12, 2025 at 12:03:37PM +0100, Tomas Härdin wrote:
> > > > tor 2025-02-06 klockan 15:58 +0100 skrev Michael Niedermayer:
> > > > > Hi Tomas
> > > > > 
> > > > > On Wed, Feb 05, 2025 at 03:24:24PM +0100, Tomas Härdin wrote:
> > > > > > Seems reasonable to me and passes FATE
> > > > > > 
> > > > > > /Tomas
> > > > > 
> > > > > >  avformat.h   |    2 +-
> > > > > >  format.c     |    8 ++++----
> > > > > >  libopenmpt.c |    2 +-
> > > > > >  3 files changed, 6 insertions(+), 6 deletions(-)
> > > > > > 01f04f79202640330d6be91b0215f92f14d1845a  0008-Make-mime-
> > > > > > type-
> > > > > > award-a-bonus-probe-score.patch
> > > > > > From ecc3459990f2871fd907f96fe66362b8fea41bd8 Mon Sep 17
> > > > > > 00:00:00
> > > > > > 2001
> > > > > > From: =?UTF-8?q?Peter=20Zeb=C3=BChr?= <peterz at spotify.com>
> > > > > > Date: Tue, 21 Nov 2023 14:16:49 +0100
> > > > > > Subject: [PATCH 8/8] Make mime-type award a bonus probe
> > > > > > score
> > > > > > 
> > > > > > This changes the default behaviour of ffmpeg where content-
> > > > > > type
> > > > > > headers
> > > > > > on an input gives an absolut probe score (of 75) to instead
> > > > > > give a
> > > > > > bonus
> > > > > > score (of 30). This gives the probe a better chance to
> > > > > > arrive
> > > > > > at
> > > > > > the
> > > > > > correct format by (hopefully) giving a large enough bonus
> > > > > > to
> > > > > > push
> > > > > > edge
> > > > > > cases in the right direction (MPEG-PS vs MP3, I am looking
> > > > > > at
> > > > > > you)
> > > > > > while
> > > > > > also not adversly punishing clearer cases (raw ADTS marked
> > > > > > as
> > > > > > "audio/mpeg" for example).
> > > > > > 
> > > > > > This patch was regression tested against 20 million recent
> > > > > > podcast
> > > > > > submissions (after content-type propagation was added to
> > > > > > original-storage), and 50k Juno vodcasts submissions
> > > > > > (dito). No
> > > > > > adverse
> > > > > > effects observed (but the bonus may still need tweaking if
> > > > > > other
> > > > > > edge
> > > > > > cases are detected in production).
> > > > > > ---
> > > > > >  libavformat/avformat.h   | 2 +-
> > > > > >  libavformat/format.c     | 8 ++++----
> > > > > >  libavformat/libopenmpt.c | 2 +-
> > > > > >  3 files changed, 6 insertions(+), 6 deletions(-)
> > > > > 
> > > > > what is the score ?
> > > > > a higher score means more likely but how much more ?
> > > > > maybe we should come up with a more formal definition
> > > > > like that score is the number of bits of entropy that where
> > > > > checked
> > > > > or
> > > > > something like that.
> > > > > in such a framework, adding 30 for a mime type match would
> > > > > probably
> > > > > make sense
> > > > > 
> > > > > without such a framework, adding 30 to a abstract score is
> > > > > hard
> > > > > to
> > > > > review
> > > > > beyond that, i dont see anything breaking from this but then
> > > > > i
> > > > > dont think we have real tests for mime types
> > > > 
> > > > We don't really have tests for the probe scores at all, which
> > > > is a
> > > > problem. Perhaps if we collected some tricky samples we could
> > > > construct
> > > > a test that demands a certain ordering of probe scores for
> > > > them?
> > > > For
> > > > now scores are tested indirectly by the fact that most tests
> > > > rely
> > > > on
> > > > correct probing
> > > 
> > > we have
> > > tools/probetest
> > > 
> > > probetest [-f <input format>] [<retry_count> [<max_size>]]
> > 
> > Yeah but that only tests with random data, not say an ordering of
> > probe
> > scores for actual test files.
> 
> yes, it could/should be extended
> 
> probetest as is is still quite usefull though as it catches probe
> functions
> which give high scores on random trash

Might be better to leverage afl-fuzz since it is more wily in its
tricks to provoke different program behavior. Then exit(1) whenever the
test program probes something incorrectly. For example you could start
with a small, valid MPEG-PS file and have afl-fuzz generate slightly
different versions of it that don't probe as such

/Tomas



More information about the ffmpeg-devel mailing list