[FFmpeg-devel] [PATCH 4/4] avformat/libopenmpt: Probe file format from file data if possible

Carl Eugen Hoyos ceffmpeg at gmail.com
Tue Jan 9 03:30:30 EET 2018

2018-01-08 20:25 GMT+01:00 Jörn Heusipp <osmanx at problemloesungsmaschine.de>:
> On 01/08/2018 12:57 AM, Carl Eugen Hoyos wrote:
>> 2018-01-07 15:40 GMT+01:00 Jörn Heusipp
>> <osmanx at problemloesungsmaschine.de>:
>>> On 01/06/2018 04:10 PM, Carl Eugen Hoyos wrote:
>>>> 2018-01-06 11:07 GMT+01:00 Jörn Heusipp
>>>> <osmanx at problemloesungsmaschine.de>:
>>>>> libopenmpt file header probing is tested regularly against the FATE
>>>>> suite and other diverse file collections by libopenmpt upstream in
>>>>> order to avoid false positives.
>>>> You could also test tools/probetest
>>> I was not aware of that tool. Thanks for the suggestion.
>>> It currently lists a failure related to libopenmpt:
>>> Failure of libopenmpt probing code with score=76 type=0 p=FDC size=1024
>> I did not look at this closely but I suspect libopenmpt should return a
>> smaller score in this case.
> We (libopenmpt developers) are currently considering making the heuristic
> for M15 files even stricter. The changes will land in libopenmpt 0.3.5.
> libopenmpt 0.3.5 versus 0.3.4 or earlier can be distinguished at runtime via
> openmpt_get_library_version(). I would be fine with only doing probing via
> libopenmpt in FFmpeg starting with libopenmt 0.3.5 and relying on file
> extensions for earlier versions.
> However, the data that tools/probetest.c generates here fundamentally does
> have a somewhat high probability of looking like a completely legit M15
> file. False positives are not really avoidable completely no matter what
> libopenmpt does here. The failing data is synthetic, and I am not seeing any
> M15 false positives at all on real-world file collections (media and
> non-media files (tested on 1.2 million data and system files)).
>> A solution could be to never return a high value for the FFmpeg
>> probe function.
> Maybe, but given what tools/probetest generates here, I somewhat doubt these
> examples have any real-world implication at all.
> Anyway, in case a lower score is deemed to be useful, do you have any
> suggestions which score I should pick? AVPROBE_SCORE_EXTENSION or less would
> probably not be very useful, so what comes to mind for me is

No real suggestion here, above was just an idea.

>>> Looking at tools/probetest.c, that would be a file with very few bits
>>> set.
>>> libopenmpt detects the random data in question as M15 .MOD files
>>> (original
>>> Amiga 15 samples .mod files), and there is sadly not much that can be
>>> done
>>> about that. There are perfectly valid real-world M15 .MOD files with only
>>> 73
>>> bits set in the first 600 bytes (untouchables-station.mod,
>>> <https://modarchive.org/index.php?request=view_by_moduleid&query=104280>).
>>> The following up to 64*4*4*63 bytes could even be completely 0 (empty
>>> pattern data) for valid files (even without the file being totally
>>> silent).
>>> The generated random data that tools/probetest complains about here
>>> contains
>>> 46 bits set to 1 in the first 600 bytes. What follows are various other
>>> examples with up to 96 bits set to 1. Completely loading a file like that
>>> would very likely reject it (in particular, libopenmpt can deduce the
>>> expected file size from the sample headers and, with some slack to
>>> account
>>> for corrupted real-world examples, reject files with non-matching size),
>>> however, that is not possible when only probing the file header.
>>> The libopenmpt API allows for the file header probing functions to return
>>> false-positives, however false-negatives are not allowed.
>>> Performance numbers shown by tools/probetest are what I had expected
>>> (measured on an AMD A4-5000, during normal Xorg session (i.e. not 100%
>>> idle)):
> [...]
>>> 109589637233 cycles,   libopenmpt
>> This sadly may not be acceptable, others may want to comment.
>>>    2672917981           libopenmpt (per module format)
>>> At first glance, libopenmpt looks huge here in comparison. However one
>>> should consider that libopenmpt internally has to probe for (currently)
>>> 41
>>> different module file formats, going through 41 separate probing
>>> functions
>>> internally.
>>> Dividing 109589637233 by 41 gives 2672917981, which is in the ballpark of
>>> all other probing functions in ffmpeg.
> What are your expectations for probing speed of 41 completely distinct file
> formats?

My only expectation is that other FFmpeg developers comment, a
(imo strong) argument in your favour is that this will only apply if
an optional external library is activated at compile-time.

> Even only h261,h263,h264,hevc,aac,ac3 (raw streams) combined take more time
> than libopenmpt takes for its 41 formats together.

It is otoh imo not a useful argument to compare four of the most
common formats (we have to auto-detect them for mpeg-ts
recordings) to libopenmpt;-)

> All other FFmpeg probing functions combined (234 formats) take 1201426924609
> cycles. libopenmpt adds 109589637233 cycles for 41 different file formats to
> that, which is about 10%. I do not think probing performance is in general
> that performance critical that would make 10% a problem, especially
> considering that for real-world use cases when probing a whole media
> library, the data also has to be read from storage in the first place.

It is 10% for probetest, not sure if this compares well to real-world

But if nobody else comments, I support your patch!

Carl Eugen

More information about the ffmpeg-devel mailing list