[FFmpeg-devel] [PATCH 4/4] avformat/libopenmpt: Probe file format from file data if possible

Jörn Heusipp osmanx at problemloesungsmaschine.de
Mon Jan 8 21:25:00 EET 2018

On 01/08/2018 12:57 AM, Carl Eugen Hoyos wrote:
> 2018-01-07 15:40 GMT+01:00 Jörn Heusipp <osmanx at problemloesungsmaschine.de>:
>> On 01/06/2018 04:10 PM, Carl Eugen Hoyos wrote:
>>> 2018-01-06 11:07 GMT+01:00 Jörn Heusipp
>>> <osmanx at problemloesungsmaschine.de>:

>>>> libopenmpt file header probing is tested regularly against the FATE
>>>> suite and other diverse file collections by libopenmpt upstream in
>>>> order to avoid false positives.
>>> You could also test tools/probetest
>> I was not aware of that tool. Thanks for the suggestion.
>> It currently lists a failure related to libopenmpt:
>> Failure of libopenmpt probing code with score=76 type=0 p=FDC size=1024
> I did not look at this closely but I suspect libopenmpt should return a
> smaller score in this case.

We (libopenmpt developers) are currently considering making the 
heuristic for M15 files even stricter. The changes will land in 
libopenmpt 0.3.5.
libopenmpt 0.3.5 versus 0.3.4 or earlier can be distinguished at runtime 
via openmpt_get_library_version(). I would be fine with only doing 
probing via libopenmpt in FFmpeg starting with libopenmt 0.3.5 and 
relying on file extensions for earlier versions.

However, the data that tools/probetest.c generates here fundamentally 
does have a somewhat high probability of looking like a completely legit 
M15 file. False positives are not really avoidable completely no matter 
what libopenmpt does here. The failing data is synthetic, and I am not 
seeing any M15 false positives at all on real-world file collections 
(media and non-media files (tested on 1.2 million data and system files)).

> A solution could be to never return a high value for the FFmpeg
> probe function.

Maybe, but given what tools/probetest generates here, I somewhat doubt 
these examples have any real-world implication at all.
Anyway, in case a lower score is deemed to be useful, do you have any 
suggestions which score I should pick? AVPROBE_SCORE_EXTENSION or less 
would probably not be very useful, so what comes to mind for me is 

>> Looking at tools/probetest.c, that would be a file with very few bits set.
>> libopenmpt detects the random data in question as M15 .MOD files (original
>> Amiga 15 samples .mod files), and there is sadly not much that can be done
>> about that. There are perfectly valid real-world M15 .MOD files with only 73
>> bits set in the first 600 bytes (untouchables-station.mod,
>> <https://modarchive.org/index.php?request=view_by_moduleid&query=104280>).
>> The following up to 64*4*4*63 bytes could even be completely 0 (empty
>> pattern data) for valid files (even without the file being totally silent).
>> The generated random data that tools/probetest complains about here contains
>> 46 bits set to 1 in the first 600 bytes. What follows are various other
>> examples with up to 96 bits set to 1. Completely loading a file like that
>> would very likely reject it (in particular, libopenmpt can deduce the
>> expected file size from the sample headers and, with some slack to account
>> for corrupted real-world examples, reject files with non-matching size),
>> however, that is not possible when only probing the file header.
>> The libopenmpt API allows for the file header probing functions to return
>> false-positives, however false-negatives are not allowed.
>> Performance numbers shown by tools/probetest are what I had expected
>> (measured on an AMD A4-5000, during normal Xorg session (i.e. not 100%
>> idle)):


>> 109589637233 cycles,   libopenmpt
> This sadly may not be acceptable, others may want to comment.
>>    2672917981           libopenmpt (per module format)
>> At first glance, libopenmpt looks huge here in comparison. However one
>> should consider that libopenmpt internally has to probe for (currently) 41
>> different module file formats, going through 41 separate probing functions
>> internally.
>> Dividing 109589637233 by 41 gives 2672917981, which is in the ballpark of
>> all other probing functions in ffmpeg.

What are your expectations for probing speed of 41 completely distinct 
file formats?
Even only h261,h263,h264,hevc,aac,ac3 (raw streams) combined take more 
time than libopenmpt takes for its 41 formats together.
All other FFmpeg probing functions combined (234 formats) take 
1201426924609 cycles. libopenmpt adds 109589637233 cycles for 41 
different file formats to that, which is about 10%. I do not think 
probing performance is in general that performance critical that would 
make 10% a problem, especially considering that for real-world use cases 
when probing a whole media library, the data also has to be read from 
storage in the first place.


More information about the ffmpeg-devel mailing list