[FFmpeg-devel] [PATCH 05/10] avformat/matroskadec: Remove non-incremental parsing of clusters

Andreas Rheinhardt andreas.rheinhardt at googlemail.com
Tue Mar 12 07:05:00 EET 2019


Michael Niedermayer:
> On Sun, Mar 10, 2019 at 11:03:00PM +0000, Andreas Rheinhardt wrote:
>> Michael Niedermayer:
>>> On Fri, Mar 08, 2019 at 10:25:59AM +0100, Andreas Rheinhardt wrote:
>>>> When the new incremental parser was introduced, the old parser was
>>>> kept, because the new parser was unable to handle the way SSA packets
>>>> are put into Matroska. But since 2014 (since
>>>> c7d8dbad14ed5fa3c217a4fc1790021d6c0b6416) this is no longer needed, so
>>>> that the old parser can be completely removed.
>>>>
>>>> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt at googlemail.com>
>>>> ---
>>>>  libavformat/matroskadec.c | 72 ++++++---------------------------------
>>>>  1 file changed, 10 insertions(+), 62 deletions(-)
>>>
>>> This affects seeking:
>>>
>>> libavformat/tests/seek issues/388/Matrix.Reloaded.Trailer-640x346-XviD-1.0beta2-HE_AAC_subtitled.mkv -duration 400 >oldseek
>>>
>>> The file appears to be available here:
>>> https://samples.ffmpeg.org/Matroska/matrix/Matrix.Reloaded.Trailer-640x346-XviD-1.0beta2-HE_AAC_subtitled.mkv
>>>
>>> --- oldseek	2019-03-08 23:08:21.380042329 +0100
>>> +++ newseek	2019-03-08 23:08:02.048041745 +0100
>>> @@ -8,7 +8,7 @@
>>>  ret: 0         st:13 flags:1 dts: 86.750000 pts: 86.750000 pos:     -1 size: 50436
>>>  ret:-1         st: 1 flags:0  ts: 250.577000
>>>  ret: 0         st: 1 flags:1  ts: 13.471000
>>> -ret: 0         st:13 flags:1 dts: 1.776000 pts: 1.776000 pos:     -1 size: 50436
>>> +ret: 0         st:13 flags:1 dts: 0.000000 pts: 0.000000 pos:     -1 size: 50436
>>>  ret:-1         st: 2 flags:0  ts: 176.365000
>>>  ret: 0         st: 2 flags:1  ts: 339.259000
>>>  ret: 0         st:13 flags:1 dts: 145.080000 pts: 145.080000 pos:     -1 size: 50436
>>>
>> This is not unexpected. The reason is that the old parser always
>> parses a whole cluster at once while the new parser parses clusters
>> incrementally, i.e. one block (I use block as a general term for
>> SimpleBlock or BlockGroup) at a time. The goal of incremental parsing
>> was reduced latency (and with my patch #6 one also achieves a
>> reduction of memory used and an increase in performance as well).
>>
>> With the old parser, the very first cluster gets parsed during
>> avformat_find_stream_info() and index entries were created for all the
>> keyframes contained therein (the cues only contain entries for the
>> main video track, so this only affects the other tracks). The 1.776
>> pts corresponds to the block with the highest timestamp for track #1
>> (or track #2 in Matroska's counting) in the first cluster (with a
>> timestamp of 1776ms).
>>
>> With the incremental parser, only one block of the audio track gets
>> parsed during avformat_find_stream_info() (it has a timestamp of 0)
>> and so only one index entry gets created for it.
>>
>> So when the seek based upon the audio track is performed, the used
>> index point has a timestamp of either 0ms or 1776ms. Then the
>> current_dts is updated based upon this timestamp.
>>
>> Now this file has an attached picture which gets translated into its
>> own track and upon every seek this picture will be put into the
>> raw_packet_buffer from which it will be read by ff_read_packet as the
>> first packet after each seek. Given that this packet doesn't have
>> timestamps, the timestamp will be set equal to current_dts (in
>> compute_pkt_fields()). Hence the discrepancy.
>>
>> Btw: Because of things like this, a fate test had to be changed during
>> the initial introduction of incremental parsing (in
>> 8336eb6f85e4b94b9c198b16bd0ac4178f4dba86).
> 
> Sounds like undesirable behavior if not a bug
> 
> The seek request is to 13.471000 and the result should not depend on
> what has been parsed or not prior to the seek request.
> also if a packet can be produced for 1.776000 which adequatly fullfills
> the seek reuest that is better than a more distant one as that would
> increase latency experienced by the user
> 
If all one cares about is that the returned packet is near to the
desired timestamp, then matroska_read_seek succeeds in two cases:
a) The currently available index entries for the desired track are so
fine-grained that one can use this index to seek accurately. This
includes the standard case that one seeks according to a video track
for which the file provides cues (for the keyframes).
b) The desired timestamp is so big that it is beyond the last index
entry or the returned index entry is the last index entry. In this
case the file is read from the last index entry onward, so that a very
precise timestamp can be found.

If one only seeks wrt the same track and if said track either has a
good index or no index, then this works well: It's clear that it works
in case there is a good index and if there is no index at all (because
in this case the index that is being built up in the background is
good for a whole initial portion of the file and said portion won't
have "holes").

But if one changes the track wrt to one seeks, problems can arise.
Think of the case where one first seeks via the cues according to the
video track to positions t_0 and t_1 and reads a few blocks at both
times. This includes the creation of index entries for other tracks
(at least for continuous tracks like audio tracks). If one now seeks
wrt to such an audio track to time t' with t_0<t'<t_1, one might end
up in a place pretty far away from t'. This happens for both parsing
methods.

There is a case where the incremental parser fares better: If the
clusters referenced in the cues all begin with video keyframes
(standard mkvmerge behaviour, so pretty normal) and one only wants to
read a single frame from t_0 and t_1, then these frames will be video
keyframes and therefore no index entries for the other tracks will
have been created, so that if one seeks with respect to a different
track than the video track, the whole file will be read until one
reaches the desired timestamp. I therefore view the fact that
matroska_read_seek's result depends upon what has already been parsed
as a general bug in matroska_read_seek, not as a drawback of
incremental parsing compared to classical parsing.

Here are the results for what I have just described. Old parsing:
ret: 0         st: 1 flags:1 dts: 0.000000 pts: 0.000000 pos:   6653
size:  1792
ret: 0         st:-1 flags:0  ts:-1.000000
ret: 0         st: 0 flags:1 dts: NOPTS    pts: 0.016000 pos:  20998
size:134602
ret: 0         st:-1 flags:1  ts: 161.894167
ret: 0         st: 0 flags:1 dts: NOPTS    pts: 161.696000
pos:212409707 size:128647
ret: 0         st: 0 flags:0  ts: 324.788000
ret: 0         st: 0 flags:1 dts: NOPTS    pts: 324.936000
pos:356642724 size:195297
ret: 0         st: 0 flags:1  ts: 87.683000
ret: 0         st: 0 flags:1 dts: NOPTS    pts: 87.456000
pos:122474581 size:251000
ret: 0         st: 1 flags:0  ts: 250.577000
ret: 0         st: 1 flags:1 dts: 324.960000 pts: 324.960000
pos:356873874 size:  1792
ret: 0         st: 1 flags:1  ts: 13.471000
ret: 0         st: 1 flags:1 dts: 0.512000 pts: 0.512000 pos: 450228
size:  1792

Incremental parsing:
ret: 0         st: 1 flags:1 dts: 0.000000 pts: 0.000000 pos:   6653
size:  1792
ret: 0         st:-1 flags:0  ts:-1.000000
ret: 0         st: 0 flags:1 dts: NOPTS    pts: 0.016000 pos:  20998
size:134602
ret: 0         st:-1 flags:1  ts: 161.894167
ret: 0         st: 0 flags:1 dts: NOPTS    pts: 161.696000
pos:212409707 size:128647
ret: 0         st: 0 flags:0  ts: 324.788000
ret: 0         st: 0 flags:1 dts: NOPTS    pts: 324.936000
pos:356642724 size:195297
ret: 0         st: 0 flags:1  ts: 87.683000
ret: 0         st: 0 flags:1 dts: NOPTS    pts: 87.456000
pos:122474581 size:251000
ret: 0         st: 1 flags:0  ts: 250.577000
ret: 0         st: 1 flags:1 dts: 250.592000 pts: 250.592000
pos:296449118 size:  1792
ret: 0         st: 1 flags:1  ts: 13.471000
ret: 0         st: 2 flags:1 dts: 13.216000 pts: 13.216000
pos:18014477 size:    20



(I see two main avenues to improve matroska_read_seek:
1. Use the index of other tracks if the best index entry for the
desired track is too far away. (Problems: What about files with wrong
interleavement? What is too far away? If the targeted track has gaps
in it, then one might end up parsing a lot unnecessarily.)
2. If the found index is too far away, one could simply parse the file
a bit more.

2. would have the disadvantage that even video tracks with proper cues
(i.e. an entry for every keyframe), but long GOPs would be parsed.
This could be mitigated by restricting it to tracks for which no cue
entries were present (i.e. when cue entries are present for a track,
then it is presumed that there is a cue entry for every keyframe of
said track) and which can be presumed to have no holes in it (i.e. no
subtitle tracks).)

- Andreas


More information about the ffmpeg-devel mailing list