[FFmpeg-devel] [PATCH 4/4] avformat/libopenmpt: Probe file format from file data if possible

Carl Eugen Hoyos ceffmpeg at gmail.com
Mon Jan 8 01:57:56 EET 2018


2018-01-07 15:40 GMT+01:00 Jörn Heusipp <osmanx at problemloesungsmaschine.de>:
>
> On 01/06/2018 04:10 PM, Carl Eugen Hoyos wrote:
>>
>> 2018-01-06 11:07 GMT+01:00 Jörn Heusipp
>> <osmanx at problemloesungsmaschine.de>:
>
>
>>> libopenmpt file header probing is tested regularly against the FATE
>>> suite and other diverse file collections by libopenmpt upstream in
>>> order to avoid false positives.
>>
>>
>> You could also test tools/probetest
>
>
> I was not aware of that tool. Thanks for the suggestion.
>
> It currently lists a failure related to libopenmpt:
> Failure of libopenmpt probing code with score=76 type=0 p=FDC size=1024

I did not look at this closely but I suspect libopenmpt should return a
smaller score in this case.

A solution could be to never return a high value for the FFmpeg
probe function.

> Looking at tools/probetest.c, that would be a file with very few bits set.
> libopenmpt detects the random data in question as M15 .MOD files (original
> Amiga 15 samples .mod files), and there is sadly not much that can be done
> about that. There are perfectly valid real-world M15 .MOD files with only 73
> bits set in the first 600 bytes (untouchables-station.mod,
> <https://modarchive.org/index.php?request=view_by_moduleid&query=104280>).
> The following up to 64*4*4*63 bytes could even be completely 0 (empty
> pattern data) for valid files (even without the file being totally silent).
> The generated random data that tools/probetest complains about here contains
> 46 bits set to 1 in the first 600 bytes. What follows are various other
> examples with up to 96 bits set to 1. Completely loading a file like that
> would very likely reject it (in particular, libopenmpt can deduce the
> expected file size from the sample headers and, with some slack to account
> for corrupted real-world examples, reject files with non-matching size),
> however, that is not possible when only probing the file header.
> The libopenmpt API allows for the file header probing functions to return
> false-positives, however false-negatives are not allowed.
>
> Performance numbers shown by tools/probetest are what I had expected
> (measured on an AMD A4-5000, during normal Xorg session (i.e. not 100%
> idle)):
>
>   1110194971 cycles,           aa
>  24986722468 cycles,          aac
>  26418545168 cycles,          ac3
>   1484717267 cycles,          acm
>   1627888281 cycles,          act
>   2109884646 cycles,          adp
>   2316235992 cycles,          ads
>   1244706028 cycles,          adx
>   1132390431 cycles,          aea
>   1729241354 cycles,         aiff
>   1728288238 cycles,          aix
>   2662531158 cycles,          amr
>  16189546067 cycles,        amrnb
>  10342883200 cycles,        amrwb
>   1487752343 cycles,          anm
>   2268900502 cycles,          apc
>   1140814303 cycles,          ape
>   2181170710 cycles,         apng
>  18698762054 cycles,      aqtitle
>   2656908730 cycles,          asf
>   2402762967 cycles,        asf_o
>  18148196647 cycles,          ass
>   1392503829 cycles,          ast
>   1774264703 cycles,           au
>   1807159562 cycles,          avi
>   1745391230 cycles,          avr
>   1370939762 cycles,          avs
>   1555620708 cycles,  bethsoftvid
>   1459171160 cycles,          bfi
>   2640635742 cycles,         bink
>   2022320986 cycles,          bit
>   1664933324 cycles,        bfstm
>   1588023172 cycles,        brstm
>   1769430536 cycles,          boa
>   2294286860 cycles,          c93
>   1022646071 cycles,          caf
>   9063207678 cycles,    cavsvideo
>   1898790300 cycles,         cdxl
>   1037718383 cycles,         cine
>   3358938768 cycles,       concat
>   2367399953 cycles,        dcstr
>   1795803759 cycles,          dfa
>   1454750468 cycles,        dirac
>   1167905836 cycles,        dnxhd
>   2076678208 cycles,          dsf
>   1226761232 cycles,       dsicin
>   1157816261 cycles,          dss
>  31466350564 cycles,          dts
>   1357475606 cycles,        dtshd
>  15626181281 cycles,           dv
>  12227021709 cycles,       dvbsub
>   1747998309 cycles,       dvbtxt
>   1941371107 cycles,          dxa
>   1988122049 cycles,           ea
>   1395161162 cycles,     ea_cdata
>  21013119067 cycles,         eac3
>   1282697126 cycles,         epaf
>   1658521102 cycles,          ffm
>   2919787300 cycles,   ffmetadata
>   3786264941 cycles,         fits
>   2700385826 cycles,         flac
>   1840776863 cycles,         flic
>   1317695853 cycles,          flv
>   1511756606 cycles,     live_flv
>   1135064427 cycles,          4xm
>   1830871233 cycles,          frm
>   3011403748 cycles,          fsb
>   1462985803 cycles,          gdv
>   1708440935 cycles,         genh
>   3480430744 cycles,          gif
>   2533542048 cycles,          gsm
>   2412598563 cycles,          gxf
>  21637989787 cycles,         h261
>  22268834035 cycles,         h263
>  22135718754 cycles,         h264
>  13939886275 cycles,         hevc
>   1979375582 cycles, hls,applehttp
>   1658646375 cycles,          hnm
>   1507634977 cycles,          ico
>   2534774499 cycles,        idcin
>   1684324336 cycles,          idf
>   1353664382 cycles,          iff
>   2978779893 cycles,         ilbc
>   1892353081 cycles,    alias_pix
>   2456259645 cycles,  brender_pix
>   2077466815 cycles,    ingenient
>  11281657144 cycles,      ipmovie
>   1840789384 cycles,        ircam
>   2455541614 cycles,          iss
>   1114518907 cycles,          iv8
>   1750327098 cycles,          ivf
>   3803895407 cycles,          ivr
>  30510491919 cycles,      jacosub
>   1271391143 cycles,           jv
>   1504674165 cycles,        lmlm4
>  28284647311 cycles,         loas
>   2746771768 cycles,          lrc
>   1630546444 cycles,          lvf
>   2198871369 cycles,          lxf
>  15210250791 cycles,          m4v
>   2074024051 cycles, matroska,webm
>   1756348463 cycles,        mgsts
>  13894318111 cycles,     microdvd
>  15146276963 cycles,        mjpeg
>  13215378411 cycles,   mjpeg_2000
>  21505153187 cycles,          mlp
>   1623684275 cycles,          mlv
>   2009009898 cycles,           mm
>   1401453493 cycles,          mmf
>   3614852044 cycles, mov,mp4,m4a,3gp,3g2,mj2
>  37065167696 cycles,          mp3
>   2003306237 cycles,          mpc
>   1695842377 cycles,         mpc8
>  20922947044 cycles,         mpeg
>  26950626806 cycles,       mpegts
>  12903395151 cycles,    mpegvideo
>   1861191163 cycles,       mpjpeg
>  11292546869 cycles,         mpl2
>  10904909514 cycles,        mpsub
>   2556705558 cycles,          msf
>  14520727615 cycles,     msnwctcp
>   1513345014 cycles,         mtaf
>   1498181103 cycles,          mtv
>   2100567692 cycles,         musx
>   1398481833 cycles,           mv
>   3839928046 cycles,          mxf
>   1084340183 cycles,           nc
>   2260039804 cycles,   nistsphere
>   1557302811 cycles,          nsp
>  14077588650 cycles,          nsv
>  12804865958 cycles,          nut
>   3498085105 cycles,          nuv
>   2785399093 cycles,          ogg
>   2800628120 cycles,          oma
>   2241873172 cycles,          paf
>  11630567717 cycles,          pjs
>   1538360044 cycles,          pmp
>   1966776985 cycles,          pva
>   2051297210 cycles,          pvf
>   1464824135 cycles,          qcp
>   1395151376 cycles,          r3d
>  13872717447 cycles,     realtext
>   1648061451 cycles,     redspark
>   1881530375 cycles,          rl2
>   1865198787 cycles,           rm
>   1848791502 cycles,          roq
>   3141932957 cycles,          rpl
>   2379252069 cycles,          rsd
>  31146518791 cycles,        s337m
>   7497815228 cycles,         sami
>  24830800138 cycles,          sbg
>  15351196732 cycles,          scc
>   9758760073 cycles,          sdp
>   2159674057 cycles,         sdr2
>   1555316250 cycles,          sds
>   1533405328 cycles,          sdx
>   1681270049 cycles,     film_cpk
>   2303851902 cycles,          shn
>   1761647489 cycles,         siff
>   1510520120 cycles,          smk
>   2859907925 cycles,       smjpeg
>   1643498999 cycles,        smush
>   1545689291 cycles,          sol
>   1912740702 cycles,          sox
>  17486361594 cycles,        spdif
>  20080502425 cycles,          srt
>   2659637846 cycles,       psxstr
>  17633213722 cycles,          stl
>   8032855323 cycles,   subviewer1
>   8572013351 cycles,    subviewer
>   2043897951 cycles,          sup
>   2980746200 cycles,         svag
>   1617398584 cycles,          swf
>   2842115745 cycles,          tak
>   5320163051 cycles,  tedcaptions
>   1884107745 cycles,          thp
>   4320119922 cycles,       3dostr
>   2018755118 cycles,   tiertexseq
>   1714617022 cycles,          tmv
>  21456317423 cycles,       truehd
>   1050826275 cycles,          tta
>   2065773077 cycles,          txd
>   1577829281 cycles,           ty
>   3450802460 cycles,          vag
>  19179500628 cycles,          vc1
>   1860036853 cycles,      vc1test
>   2035593194 cycles,         vivo
>   1518758455 cycles,          vmd
>   2696860615 cycles,       vobsub
>   2762235280 cycles,          voc
>   1957794567 cycles,          vpk
>  15280000639 cycles,      vplayer
>   1763355055 cycles,          vqf
>   1879310121 cycles,          w64
>   1717961542 cycles,          wav
>   2095837026 cycles,     wc3movie
>   2960188092 cycles,       webvtt
>   1922356839 cycles,        wsaud
>   1978715237 cycles,          wsd
>   1468438585 cycles,        wsvqa
>   2668937770 cycles,          wtv
>   3193222838 cycles,          wve
>   1744694735 cycles,           wv
>   1677278541 cycles,           xa
>   1759862474 cycles,         xbin
>   2077217647 cycles,          xmv
>   2161496331 cycles,         xvag
>   2330794326 cycles,         xwma
>   1103137131 cycles,          yop
>   2154690280 cycles, yuv4mpegpipe
>   1842301899 cycles,     bmp_pipe
>   2039875920 cycles,     dds_pipe
>   1627504710 cycles,     dpx_pipe
>   1463019740 cycles,     exr_pipe
>   1539585051 cycles,     j2k_pipe
>   1187861714 cycles,    jpeg_pipe
>   1682815484 cycles,  jpegls_pipe
>   1840465166 cycles,     pam_pipe
>   1755858395 cycles,     pbm_pipe
>   1211589601 cycles,     pcx_pipe
>   2002446954 cycles,  pgmyuv_pipe
>   1818965412 cycles,     pgm_pipe
>   1654095834 cycles,  pictor_pipe
>   1404252441 cycles,     png_pipe
>   1211120882 cycles,     ppm_pipe
>   1205883539 cycles,     psd_pipe
>   1764091290 cycles,   qdraw_pipe
>   1091809273 cycles,     sgi_pipe
>   2994663150 cycles,     svg_pipe
>   1348938514 cycles, sunrast_pipe
>   1464347337 cycles,    tiff_pipe
>   1142572756 cycles,    webp_pipe
>   1412715104 cycles,     xpm_pipe
>   3550700989 cycles,   libmodplug

> 109589637233 cycles,   libopenmpt

This sadly may not be acceptable, others may want to comment.

>   2672917981           libopenmpt (per module format)
>
> At first glance, libopenmpt looks huge here in comparison. However one
> should consider that libopenmpt internally has to probe for (currently) 41
> different module file formats, going through 41 separate probing functions
> internally.
>
> Dividing 109589637233 by 41 gives 2672917981, which is in the ballpark of
> all other probing functions in ffmpeg.
>
>>> +#if OPENMPT_API_VERSION_AT_LEAST(0,3,0)
>>> +    if (p->buf && p->buf_size > 0) {
>>> +        probe_result = openmpt_probe_file_header_without_filesize(
>>> +                           OPENMPT_PROBE_FILE_HEADER_FLAGS_DEFAULT,
>>> +                           p->buf, p->buf_size,
>>> +                           &openmpt_logfunc, NULL, NULL, NULL, NULL,
>>> NULL);
>>> +        if (probe_result == OPENMPT_PROBE_FILE_HEADER_RESULT_FAILURE) {
>>> +            score = score_fail;
>>
>>
>> What's wrong with return 0;?
>
>
> Nothing. If preferred, I can get rid of all score_* constants and use 0 or
> AVPROBE_SCORE_* directly.

That sounds like a very good idea to me but please wait for others to comment.

>>> +        } else if (probe_result ==
>>> OPENMPT_PROBE_FILE_HEADER_RESULT_SUCCESS) {
>>> +            score = FFMAX(score, score_data);
>>
>>
>> What does OPENMPT_PROBE_FILE_HEADER_RESULT_SUCCESS mean?
>
>
> It is documented as "OPENMPT_PROBE_FILE_HEADER_RESULT_SUCCESS: The file will
> most likely be supported by libopenmpt." (see
> <https://lib.openmpt.org/doc/group__libopenmpt__c.html#ga92cdc66eb529a8a4a67987b659ed3c5e>).
> An ultimately precise answer is never possible as that would require
> actually trying to load the complete file in some cases:
>  * Not all module file formats store feature flags in the file header.
>  * Some module file formats provide very little file magic numbers, and/or
> file magic numbers at strange offsets (like at 1080 for M.K. .MOD).
>  * Some formats store header-like information in the file footer, which is
> not accessible during probing.
>  * The extreme case of M15 (original 15 samples Amiga .MOD files) provides
> absolutely no true file header or magic numbers. libopenmpt implements
> heuristics to reliably identify and probe even those, however there is only
> so much it can do.
>  * Some container formats (Unreal Music .UMX, which can contain module music
> files) theoretically potentially require seeking to arbitrary locations in
> the file in order to determine the format.
>
>> Why not return MAX?
>
> For all the reasons listed above, even though libopenmpt tries to be as
> pessimistic as possible, false positives fundamentally cannot be avoided
> completely. As the libopenmpt probing logic is code outside of ffmpeg, the
> effects of such a false positive could potentially cause mis-detection of
> other formats supported by ffmpeg, which would not be immediately or easily
> fixable by ffmpeg itself. I used the lowest possible score that makes sense
> in order to reduce the risk of potential impact.
> The probing result in this case is deduced from looking at the actual file
> data, as opposed to just trusting a mime-type which is external to the file
> and could be inconsistent/wrong, which is why I used a score higher than
> AVPROBE_SCORE_MIME.
> I opted for AVPROBE_SCORE_MIME+1, which seemed reasonable to me.
> Should I add a comment explaining the reasoning to the code?
>
>>> +        } else if (probe_result ==
>>> OPENMPT_PROBE_FILE_HEADER_RESULT_WANTMOREDATA) {
>>
>>  > I believe this should return 0 but maybe you found that this is bad?
>
>
> Would 0 be semantically right here?
> OPENMPT_PROBE_FILE_HEADER_RESULT_WANTMOREDATA means that libopenmpt requires
> more data to come to any usable conclusion, which is what I thought
> AVPROBE_SCORE_RETRY would mean.
> I do not see any particular problem with returning 0 in this case either,
> given the probing logic in av_probe_input_format() (and it would reduce the
> whole probe_result == OPENMPT_PROBE_FILE_HEADER_RESULT_WANTMOREDATA block to
> a single line). However, if client code directly calls .read_probe() on

> AVInputFormat ff_libopenmpt_demuxer, I think returning AVPROBE_SCORE_RETRY
> (or similar) makes more sense.

I agree.

>>> +            if (score > score_fail) {
>>> +                /* known file extension */
>>> +                score = FFMAX(score, score_ext_retry);
>>> +            } else {
>>> +                /* unknown file extension */
>>> +                if (p->buf_size >=
>>> openmpt_probe_file_header_get_recommended_size()) {
>>> +                    /* We have already received the recommended amount
>>> of data
>>> +                     * and still cannot decide. Return a rather low
>>> score.
>>> +                     */
>>> +                    score = FFMAX(score, score_retry);
>>> +                } else {
>>> +                    /* The file extension is unknown and we have very
>>> few data
>>> +                     * bytes available. libopenmpt cannot decide
>>> anything here,
>>> +                     * and returning any score > 0 would result in
>>> successfull
>>> +                     * probing of random data.
>>> +                     */
>>> +                    score = score_fail;
>>
>> This patch indicates that it may be a good idea to require libopenmpt 0.3,
>
> The amount of #ifdef needed to support 0.2 and 0.3 is rather small, I think.
>
> I understand that the current (and future libopenmpt 0.2) way of solely
> relying on the file extension is far from optimal, but I do not see any
> reason to drop libopenmpt 0.2 support right now; in particular, continuing
> 0.2 support as is would be no regression. Additionally, libopenmpt 0.2 can
> be built with C++03 compilers while libopenmpt 0.3 requires a C++11
> compiler, thus, libopenmpt 0.3 cannot easily be built on older platforms.
>
> libopenmpt 0.2 also allows for file probing, however the API and code path
> is very heavy-weight (goes through the normal file loader and discards
> unneeded data), which I fear would be way too heavy performance-wise for
> ffmpeg.
>
>> when was it released, which distributions do not include it?
>
> The first version of libopenmpt 0.3 was released 2017-09-28.
>
> I am not aware of any stable, non-rolling distribution that ships libopenmpt
> 0.3 as of now.

Then supporting only 0.3 is currently not a good option.

Carl Eugen


More information about the ffmpeg-devel mailing list