[FFmpeg-devel] [PATCH 4/4] avformat/libopenmpt: Probe file format from file data if possible

Jörn Heusipp osmanx at problemloesungsmaschine.de
Sun Jan 7 16:40:45 EET 2018


On 01/06/2018 04:10 PM, Carl Eugen Hoyos wrote:
> 2018-01-06 11:07 GMT+01:00 Jörn Heusipp <osmanx at problemloesungsmaschine.de>:

>> libopenmpt file header probing is tested regularly against the FATE
>> suite and other diverse file collections by libopenmpt upstream in
>> order to avoid false positives.
> 
> You could also test tools/probetest

I was not aware of that tool. Thanks for the suggestion.

It currently lists a failure related to libopenmpt:
Failure of libopenmpt probing code with score=76 type=0 p=FDC size=1024

Looking at tools/probetest.c, that would be a file with very few bits 
set. libopenmpt detects the random data in question as M15 .MOD files 
(original Amiga 15 samples .mod files), and there is sadly not much that 
can be done about that. There are perfectly valid real-world M15 .MOD 
files with only 73 bits set in the first 600 bytes 
(untouchables-station.mod, 
<https://modarchive.org/index.php?request=view_by_moduleid&query=104280>). 
The following up to 64*4*4*63 bytes could even be completely 0 (empty 
pattern data) for valid files (even without the file being totally 
silent). The generated random data that tools/probetest complains about 
here contains 46 bits set to 1 in the first 600 bytes. What follows are 
various other examples with up to 96 bits set to 1. Completely loading a 
file like that would very likely reject it (in particular, libopenmpt 
can deduce the expected file size from the sample headers and, with some 
slack to account for corrupted real-world examples, reject files with 
non-matching size), however, that is not possible when only probing the 
file header.
The libopenmpt API allows for the file header probing functions to 
return false-positives, however false-negatives are not allowed.

Performance numbers shown by tools/probetest are what I had expected 
(measured on an AMD A4-5000, during normal Xorg session (i.e. not 100% 
idle)):

   1110194971 cycles,           aa
  24986722468 cycles,          aac
  26418545168 cycles,          ac3
   1484717267 cycles,          acm
   1627888281 cycles,          act
   2109884646 cycles,          adp
   2316235992 cycles,          ads
   1244706028 cycles,          adx
   1132390431 cycles,          aea
   1729241354 cycles,         aiff
   1728288238 cycles,          aix
   2662531158 cycles,          amr
  16189546067 cycles,        amrnb
  10342883200 cycles,        amrwb
   1487752343 cycles,          anm
   2268900502 cycles,          apc
   1140814303 cycles,          ape
   2181170710 cycles,         apng
  18698762054 cycles,      aqtitle
   2656908730 cycles,          asf
   2402762967 cycles,        asf_o
  18148196647 cycles,          ass
   1392503829 cycles,          ast
   1774264703 cycles,           au
   1807159562 cycles,          avi
   1745391230 cycles,          avr
   1370939762 cycles,          avs
   1555620708 cycles,  bethsoftvid
   1459171160 cycles,          bfi
   2640635742 cycles,         bink
   2022320986 cycles,          bit
   1664933324 cycles,        bfstm
   1588023172 cycles,        brstm
   1769430536 cycles,          boa
   2294286860 cycles,          c93
   1022646071 cycles,          caf
   9063207678 cycles,    cavsvideo
   1898790300 cycles,         cdxl
   1037718383 cycles,         cine
   3358938768 cycles,       concat
   2367399953 cycles,        dcstr
   1795803759 cycles,          dfa
   1454750468 cycles,        dirac
   1167905836 cycles,        dnxhd
   2076678208 cycles,          dsf
   1226761232 cycles,       dsicin
   1157816261 cycles,          dss
  31466350564 cycles,          dts
   1357475606 cycles,        dtshd
  15626181281 cycles,           dv
  12227021709 cycles,       dvbsub
   1747998309 cycles,       dvbtxt
   1941371107 cycles,          dxa
   1988122049 cycles,           ea
   1395161162 cycles,     ea_cdata
  21013119067 cycles,         eac3
   1282697126 cycles,         epaf
   1658521102 cycles,          ffm
   2919787300 cycles,   ffmetadata
   3786264941 cycles,         fits
   2700385826 cycles,         flac
   1840776863 cycles,         flic
   1317695853 cycles,          flv
   1511756606 cycles,     live_flv
   1135064427 cycles,          4xm
   1830871233 cycles,          frm
   3011403748 cycles,          fsb
   1462985803 cycles,          gdv
   1708440935 cycles,         genh
   3480430744 cycles,          gif
   2533542048 cycles,          gsm
   2412598563 cycles,          gxf
  21637989787 cycles,         h261
  22268834035 cycles,         h263
  22135718754 cycles,         h264
  13939886275 cycles,         hevc
   1979375582 cycles, hls,applehttp
   1658646375 cycles,          hnm
   1507634977 cycles,          ico
   2534774499 cycles,        idcin
   1684324336 cycles,          idf
   1353664382 cycles,          iff
   2978779893 cycles,         ilbc
   1892353081 cycles,    alias_pix
   2456259645 cycles,  brender_pix
   2077466815 cycles,    ingenient
  11281657144 cycles,      ipmovie
   1840789384 cycles,        ircam
   2455541614 cycles,          iss
   1114518907 cycles,          iv8
   1750327098 cycles,          ivf
   3803895407 cycles,          ivr
  30510491919 cycles,      jacosub
   1271391143 cycles,           jv
   1504674165 cycles,        lmlm4
  28284647311 cycles,         loas
   2746771768 cycles,          lrc
   1630546444 cycles,          lvf
   2198871369 cycles,          lxf
  15210250791 cycles,          m4v
   2074024051 cycles, matroska,webm
   1756348463 cycles,        mgsts
  13894318111 cycles,     microdvd
  15146276963 cycles,        mjpeg
  13215378411 cycles,   mjpeg_2000
  21505153187 cycles,          mlp
   1623684275 cycles,          mlv
   2009009898 cycles,           mm
   1401453493 cycles,          mmf
   3614852044 cycles, mov,mp4,m4a,3gp,3g2,mj2
  37065167696 cycles,          mp3
   2003306237 cycles,          mpc
   1695842377 cycles,         mpc8
  20922947044 cycles,         mpeg
  26950626806 cycles,       mpegts
  12903395151 cycles,    mpegvideo
   1861191163 cycles,       mpjpeg
  11292546869 cycles,         mpl2
  10904909514 cycles,        mpsub
   2556705558 cycles,          msf
  14520727615 cycles,     msnwctcp
   1513345014 cycles,         mtaf
   1498181103 cycles,          mtv
   2100567692 cycles,         musx
   1398481833 cycles,           mv
   3839928046 cycles,          mxf
   1084340183 cycles,           nc
   2260039804 cycles,   nistsphere
   1557302811 cycles,          nsp
  14077588650 cycles,          nsv
  12804865958 cycles,          nut
   3498085105 cycles,          nuv
   2785399093 cycles,          ogg
   2800628120 cycles,          oma
   2241873172 cycles,          paf
  11630567717 cycles,          pjs
   1538360044 cycles,          pmp
   1966776985 cycles,          pva
   2051297210 cycles,          pvf
   1464824135 cycles,          qcp
   1395151376 cycles,          r3d
  13872717447 cycles,     realtext
   1648061451 cycles,     redspark
   1881530375 cycles,          rl2
   1865198787 cycles,           rm
   1848791502 cycles,          roq
   3141932957 cycles,          rpl
   2379252069 cycles,          rsd
  31146518791 cycles,        s337m
   7497815228 cycles,         sami
  24830800138 cycles,          sbg
  15351196732 cycles,          scc
   9758760073 cycles,          sdp
   2159674057 cycles,         sdr2
   1555316250 cycles,          sds
   1533405328 cycles,          sdx
   1681270049 cycles,     film_cpk
   2303851902 cycles,          shn
   1761647489 cycles,         siff
   1510520120 cycles,          smk
   2859907925 cycles,       smjpeg
   1643498999 cycles,        smush
   1545689291 cycles,          sol
   1912740702 cycles,          sox
  17486361594 cycles,        spdif
  20080502425 cycles,          srt
   2659637846 cycles,       psxstr
  17633213722 cycles,          stl
   8032855323 cycles,   subviewer1
   8572013351 cycles,    subviewer
   2043897951 cycles,          sup
   2980746200 cycles,         svag
   1617398584 cycles,          swf
   2842115745 cycles,          tak
   5320163051 cycles,  tedcaptions
   1884107745 cycles,          thp
   4320119922 cycles,       3dostr
   2018755118 cycles,   tiertexseq
   1714617022 cycles,          tmv
  21456317423 cycles,       truehd
   1050826275 cycles,          tta
   2065773077 cycles,          txd
   1577829281 cycles,           ty
   3450802460 cycles,          vag
  19179500628 cycles,          vc1
   1860036853 cycles,      vc1test
   2035593194 cycles,         vivo
   1518758455 cycles,          vmd
   2696860615 cycles,       vobsub
   2762235280 cycles,          voc
   1957794567 cycles,          vpk
  15280000639 cycles,      vplayer
   1763355055 cycles,          vqf
   1879310121 cycles,          w64
   1717961542 cycles,          wav
   2095837026 cycles,     wc3movie
   2960188092 cycles,       webvtt
   1922356839 cycles,        wsaud
   1978715237 cycles,          wsd
   1468438585 cycles,        wsvqa
   2668937770 cycles,          wtv
   3193222838 cycles,          wve
   1744694735 cycles,           wv
   1677278541 cycles,           xa
   1759862474 cycles,         xbin
   2077217647 cycles,          xmv
   2161496331 cycles,         xvag
   2330794326 cycles,         xwma
   1103137131 cycles,          yop
   2154690280 cycles, yuv4mpegpipe
   1842301899 cycles,     bmp_pipe
   2039875920 cycles,     dds_pipe
   1627504710 cycles,     dpx_pipe
   1463019740 cycles,     exr_pipe
   1539585051 cycles,     j2k_pipe
   1187861714 cycles,    jpeg_pipe
   1682815484 cycles,  jpegls_pipe
   1840465166 cycles,     pam_pipe
   1755858395 cycles,     pbm_pipe
   1211589601 cycles,     pcx_pipe
   2002446954 cycles,  pgmyuv_pipe
   1818965412 cycles,     pgm_pipe
   1654095834 cycles,  pictor_pipe
   1404252441 cycles,     png_pipe
   1211120882 cycles,     ppm_pipe
   1205883539 cycles,     psd_pipe
   1764091290 cycles,   qdraw_pipe
   1091809273 cycles,     sgi_pipe
   2994663150 cycles,     svg_pipe
   1348938514 cycles, sunrast_pipe
   1464347337 cycles,    tiff_pipe
   1142572756 cycles,    webp_pipe
   1412715104 cycles,     xpm_pipe
   3550700989 cycles,   libmodplug
109589637233 cycles,   libopenmpt

   2672917981           libopenmpt (per module format)

At first glance, libopenmpt looks huge here in comparison. However one 
should consider that libopenmpt internally has to probe for (currently) 
41 different module file formats, going through 41 separate probing 
functions internally.

Dividing 109589637233 by 41 gives 2672917981, which is in the ballpark 
of all other probing functions in ffmpeg.

>> +#if OPENMPT_API_VERSION_AT_LEAST(0,3,0)
>> +    if (p->buf && p->buf_size > 0) {
>> +        probe_result = openmpt_probe_file_header_without_filesize(
>> +                           OPENMPT_PROBE_FILE_HEADER_FLAGS_DEFAULT,
>> +                           p->buf, p->buf_size,
>> +                           &openmpt_logfunc, NULL, NULL, NULL, NULL, NULL);
>> +        if (probe_result == OPENMPT_PROBE_FILE_HEADER_RESULT_FAILURE) {
>> +            score = score_fail;
> 
> What's wrong with return 0;?

Nothing. If preferred, I can get rid of all score_* constants and use 0 
or AVPROBE_SCORE_* directly.

>> +        } else if (probe_result == OPENMPT_PROBE_FILE_HEADER_RESULT_SUCCESS) {
>> +            score = FFMAX(score, score_data);
> 
> What does OPENMPT_PROBE_FILE_HEADER_RESULT_SUCCESS mean?

It is documented as "OPENMPT_PROBE_FILE_HEADER_RESULT_SUCCESS: The file 
will most likely be supported by libopenmpt." (see 
<https://lib.openmpt.org/doc/group__libopenmpt__c.html#ga92cdc66eb529a8a4a67987b659ed3c5e>).
An ultimately precise answer is never possible as that would require 
actually trying to load the complete file in some cases:
  * Not all module file formats store feature flags in the file header.
  * Some module file formats provide very little file magic numbers, 
and/or file magic numbers at strange offsets (like at 1080 for M.K. .MOD).
  * Some formats store header-like information in the file footer, which 
is not accessible during probing.
  * The extreme case of M15 (original 15 samples Amiga .MOD files) 
provides absolutely no true file header or magic numbers. libopenmpt 
implements heuristics to reliably identify and probe even those, however 
there is only so much it can do.
  * Some container formats (Unreal Music .UMX, which can contain module 
music files) theoretically potentially require seeking to arbitrary 
locations in the file in order to determine the format.

> Why not return MAX?

For all the reasons listed above, even though libopenmpt tries to be as 
pessimistic as possible, false positives fundamentally cannot be avoided 
completely. As the libopenmpt probing logic is code outside of ffmpeg, 
the effects of such a false positive could potentially cause 
mis-detection of other formats supported by ffmpeg, which would not be 
immediately or easily fixable by ffmpeg itself. I used the lowest 
possible score that makes sense in order to reduce the risk of potential 
impact.
The probing result in this case is deduced from looking at the actual 
file data, as opposed to just trusting a mime-type which is external to 
the file and could be inconsistent/wrong, which is why I used a score 
higher than AVPROBE_SCORE_MIME.
I opted for AVPROBE_SCORE_MIME+1, which seemed reasonable to me.
Should I add a comment explaining the reasoning to the code?

>> +        } else if (probe_result == OPENMPT_PROBE_FILE_HEADER_RESULT_WANTMOREDATA) {
>  > I believe this should return 0 but maybe you found that this is bad?

Would 0 be semantically right here? 
OPENMPT_PROBE_FILE_HEADER_RESULT_WANTMOREDATA means that libopenmpt 
requires more data to come to any usable conclusion, which is what I 
thought AVPROBE_SCORE_RETRY would mean.
I do not see any particular problem with returning 0 in this case 
either, given the probing logic in av_probe_input_format() (and it would 
reduce the whole probe_result == 
OPENMPT_PROBE_FILE_HEADER_RESULT_WANTMOREDATA block to a single line). 
However, if client code directly calls .read_probe() on AVInputFormat 
ff_libopenmpt_demuxer, I think returning AVPROBE_SCORE_RETRY (or 
similar) makes more sense.

>> +            if (score > score_fail) {
>> +                /* known file extension */
>> +                score = FFMAX(score, score_ext_retry);
>> +            } else {
>> +                /* unknown file extension */
>> +                if (p->buf_size >= openmpt_probe_file_header_get_recommended_size()) {
>> +                    /* We have already received the recommended amount of data
>> +                     * and still cannot decide. Return a rather low score.
>> +                     */
>> +                    score = FFMAX(score, score_retry);
>> +                } else {
>> +                    /* The file extension is unknown and we have very few data
>> +                     * bytes available. libopenmpt cannot decide anything here,
>> +                     * and returning any score > 0 would result in successfull
>> +                     * probing of random data.
>> +                     */
>> +                    score = score_fail;
> 
> This patch indicates that it may be a good idea to require libopenmpt 0.3,

The amount of #ifdef needed to support 0.2 and 0.3 is rather small, I think.

I understand that the current (and future libopenmpt 0.2) way of solely 
relying on the file extension is far from optimal, but I do not see any 
reason to drop libopenmpt 0.2 support right now; in particular, 
continuing 0.2 support as is would be no regression. Additionally, 
libopenmpt 0.2 can be built with C++03 compilers while libopenmpt 0.3 
requires a C++11 compiler, thus, libopenmpt 0.3 cannot easily be built 
on older platforms.

libopenmpt 0.2 also allows for file probing, however the API and code 
path is very heavy-weight (goes through the normal file loader and 
discards unneeded data), which I fear would be way too heavy 
performance-wise for ffmpeg.

> when was it released, which distributions do not include it?

The first version of libopenmpt 0.3 was released 2017-09-28.

I am not aware of any stable, non-rolling distribution that ships 
libopenmpt 0.3 as of now.

Debian 9 has libopenmpt 0.2.7386~beta20.3-3+deb9u2
Ubuntu 17.10 has libopenmpt 0.2.8760~beta27-1
Ubuntu 16.04 LTS has no libopenmpt at all
even openSUSE Tumbleweed only has libopenmpt 0.2.8461~beta26
Debian Testing and Ubuntu Bionic both have libopenmpt 0.3.4.

I do not think ffmpeg should drop libopenmpt 0.2 support at the moment.


Regards,
Jörn


More information about the ffmpeg-devel mailing list