[FFmpeg-devel] [PATCH 00/37 v3] Matroska demuxer patches

Andreas Rheinhardt andreas.rheinhardt at gmail.com
Fri Jun 7 19:57:00 EEST 2019

Andreas Rheinhardt:
> Hello,
> here is the patchset for the Matroska demuxer that I already announced
> last week [1]. It is an improvement upon an earlier patchset from March [2]
> which has been kindly reviewed by Steve Lhomme. Just as the earlier 
> attempt, it is intended to get rid of a bogus error message introduced 
> in 9326117b.
> 1. Here is a recap of why this bogus error message exists:
> a) The ebml parser contained in the Matroska muxer basically has two
> modes of operation: It can parse simple elements and it can parse whole
> master elements all at once. The latter is of course implemented via the
> former. Also one can designate certain element IDs as "EBML_STOP" type
> so that when such an element is encountered, parsing immediately stops
> without parsing the length and performing the checks (regarding whether
> an element is truly contained in the master element it is supposed to be
> contained in --these checks were introduced in 9326117b).
> b) Normal parsing (via matroska_parse_cluster_incremental) uses a syntax
> list for parsing that contains elements at different levels in the
> hierarchy: The elements that can be found in a cluster and (certain)
> level 1 elements (i.e. the level of the cluster); in particular,
> it contains a cluster as EBML_STOP type element. When a cluster is
> encountered and ebml_parse returns, the current level is ended (if there
> already was an open cluster) and the new cluster is entered (via a
> designated syntax list containing a cluster as master element) and
> parsed until a SimpleBlock or a BlockGroup is encountered (these two are
> of course EBML_STOP elements in the used syntax). Then the first syntax
> list (the one containing both the cluster as well as the elements
> permitted in a cluster as entries) is used again for parsing of said
> blocks and all the blocks that follow until one encounters another
> cluster or an error occurs.
> c) The problem with this approach is that clusters are not closed
> automatically when they are finished, but only when another cluster is
> encountered. This works (i.e. yields no error message) when a cluster
> follows another cluster (because no checks are performed for EBML_STOP
> type elements), but if e.g. the cues (the index) is located after a
> cluster (as happens with most Matroska files in existence), then parsing
> the cues will yield an error: After all, the preceding cluster ended
> right before the cues began, but the level hasn't been ended, so the
> length check will fail if the preceding cluster wasn't an
> unknown-length cluster. (In the latter case, the cues are considered
> part of the preceding cluster by the current code.) That's the reason
> behind this (harmless) error.
> 2. a) When ebml_parse parses a Master element by itself, it also takes
> care of ending the level (i.e. the level automatically entered during
> parsing of the master element) itself; but there is no check at the end
> of ebml_parse whether a level has just ended right after the last
> element; if there were, then the above problem wouldn't exist.
> b) So my approach is to add such a check and make ebml_parse end the
> levels all by itself; the Matroska functions shouldn't have to end
> levels manually (as happens currently when a new cluster is encountered),
> but the EBML functions would do so given that the levels are an EBML
> concept after all.
> c) But all is slightly complicated by the fact that there also can be
> unknown-length master elements. These end when an element not allowed in
> them but on a higher level is encountered. So in this situation it is
> impossible to perform the check whether a level has ended after
> consuming an element; instead, said check needs to be performed directly
> after the id of the next element is read, without reading any more data
> if it turns out that said element belongs to a higher level. And that's
> the way it has been implemented.
> 3. I have also added improvements to various checks and other things.
> You can find them explained in the various commit messages. Just two things:
> a) I am unsure how to check whether a read error or attempted reading
> beyond EOF should be checked: Via avio_feof(pb) or via pb->eof_reached?
> The former tries to read again (refill the buffer) when pb->eof_reached
> was already set; does this mean that if one wants to know whether a read
> was not successfull one should check pb->eof_reached, but when one is
> more interested into whether one can perform future reads, one should
> use avio_feof (after all, in case of livestreaming, new data might have
> arrived after the earlier failed read)? That's at least the interpretation
> that I had when I wrote the patchset.
> b) ffio_limit is used to check for whether there is enough data left to
> skip if an element intended to be skipped is encountered. There are
> three issues with this function:
> i) It takes an int, although it should be an int64_t for our needs.
> ii) It emits its own error message (in case it fails) which is not
> appropriate for our needs (it claims to be truncating a packet, but in
> our case, no packet is truncated; instead it is treated as an indication
> that an error happened and a resync is performed).
> iii) For some reason it allows the remaining size to be one less than
> the size given as argument.
> Not understanding iii) is what made me hesitate to factor the needed
> part of ffio_limit out of it and using the new function in the Matroska
> demuxer. Would be nice if someone could explain the rationale behind
> this. This originated (without any explanation) in commit 559ae20d and
> got carried over from there since then.
> - Andreas
> [1]: https://ffmpeg.org/pipermail/ffmpeg-devel/2019-May/243769.html
> [2]: https://ffmpeg.org/pipermail/ffmpeg-devel/2019-March/241694.html
> Andreas Rheinhardt (37):
>   avformat/matroskadec: Remove unused variables
>   avformat/matroskadec: Correct outdated error message
>   avformat/matroskadec: Compactify structure
>   avformat/matroskadec: Don't zero unnecessarily
>   avformat/matroskadec: Get rid of cluster size field assumption
>   avformat/matroskadec: Use generic size check for signed integers
>   avformat/matroskadec: Set offset of first cluster
>   avformat/matroskadec: Don't copy attached pictures
>   avformat/matroskadec: Remove redundant initialization
>   avformat/matroskadec: Properly check return values
>   avformat/matroskadec: Improve read error/EOF checks I
>   avformat/matroskadec: Improve read error/EOF checks II
>   avformat/matroskadec: Improve error/EOF checks III
>   avformat/matroskadec: Remove non-incremental parsing of clusters
>   avformat/matroskadec: Don't keep old blocks
>   avformat/matroskadec: Treat SimpleBlock as EBML_BIN
>   avformat/matroskadec: Don't abort resyncing upon seek failure
>   avformat/matroskadec: Add function to reset status
>   avformat/matroskadec: Use proper levels after discontínuity
>   avformat/matroskadec: Refactor some functions
>   avformat/matroskadec: Introduce a "last known good" position
>   avformat/matroskadec: Link to parents in syntax tables
>   avformat/matroskadec: Redo level handling
>   avformat/matroskadec: Make cluster parsing level compatible
>   avformat/matroskadec: Don't reset cluster position
>   avformat/matroskadec: Combine arrays
>   avformat/matroskadec: Redo EOF handling
>   avformat/matroskadec: Reuse positions
>   avformat/matroskadec: Typos, nits and cosmetics
>   avformat/matroskadec: Don't skip too much when unseekable
>   avformat/matroskadec: Improve invalid length error handling
>   avformat/matroskadec: Accept more unknown-length elements
>   avformat/matroskadec: Fix probing of unknown-length headers
>   avformat/matroskadec: Accept more unknown-length elements II
>   avformat/matroskadec: Reindent after previous commit
>   avformat/matroskadec: Use file offsets for level 1 elements
>   avformat/matroskadec: Improve check for level 1 duplicates
>  libavformat/matroskadec.c | 1006 +++++++++++++++++++++----------------
>  1 file changed, 582 insertions(+), 424 deletions(-)
Ping for the whole patchset.

- Andreas

More information about the ffmpeg-devel mailing list