[FFmpeg-devel] [PATCH 8/8] Make mime-type award a bonus probe score

Tomas Härdin git at haerdin.se
Wed Feb 12 13:03:37 EET 2025


tor 2025-02-06 klockan 15:58 +0100 skrev Michael Niedermayer:
> Hi Tomas
> 
> On Wed, Feb 05, 2025 at 03:24:24PM +0100, Tomas Härdin wrote:
> > Seems reasonable to me and passes FATE
> > 
> > /Tomas
> 
> >  avformat.h   |    2 +-
> >  format.c     |    8 ++++----
> >  libopenmpt.c |    2 +-
> >  3 files changed, 6 insertions(+), 6 deletions(-)
> > 01f04f79202640330d6be91b0215f92f14d1845a  0008-Make-mime-type-
> > award-a-bonus-probe-score.patch
> > From ecc3459990f2871fd907f96fe66362b8fea41bd8 Mon Sep 17 00:00:00
> > 2001
> > From: =?UTF-8?q?Peter=20Zeb=C3=BChr?= <peterz at spotify.com>
> > Date: Tue, 21 Nov 2023 14:16:49 +0100
> > Subject: [PATCH 8/8] Make mime-type award a bonus probe score
> > 
> > This changes the default behaviour of ffmpeg where content-type
> > headers
> > on an input gives an absolut probe score (of 75) to instead give a
> > bonus
> > score (of 30). This gives the probe a better chance to arrive at
> > the
> > correct format by (hopefully) giving a large enough bonus to push
> > edge
> > cases in the right direction (MPEG-PS vs MP3, I am looking at you)
> > while
> > also not adversly punishing clearer cases (raw ADTS marked as
> > "audio/mpeg" for example).
> > 
> > This patch was regression tested against 20 million recent podcast
> > submissions (after content-type propagation was added to
> > original-storage), and 50k Juno vodcasts submissions (dito). No
> > adverse
> > effects observed (but the bonus may still need tweaking if other
> > edge
> > cases are detected in production).
> > ---
> >  libavformat/avformat.h   | 2 +-
> >  libavformat/format.c     | 8 ++++----
> >  libavformat/libopenmpt.c | 2 +-
> >  3 files changed, 6 insertions(+), 6 deletions(-)
> 
> what is the score ?
> a higher score means more likely but how much more ?
> maybe we should come up with a more formal definition
> like that score is the number of bits of entropy that where checked
> or
> something like that.
> in such a framework, adding 30 for a mime type match would probably
> make sense
> 
> without such a framework, adding 30 to a abstract score is hard to
> review
> beyond that, i dont see anything breaking from this but then i
> dont think we have real tests for mime types

We don't really have tests for the probe scores at all, which is a
problem. Perhaps if we collected some tricky samples we could construct
a test that demands a certain ordering of probe scores for them? For
now scores are tested indirectly by the fact that most tests rely on
correct probing

Also you can't really "formalize" social relations. The reason why
certain files probe as one thing and not another is down to certain
workflows that demand such behavior, which also entails some workflows
being rejected, or at least requiring explicit -f. 

/Tomas


More information about the ffmpeg-devel mailing list