[FFmpeg-devel] [PATCH] lavf/mov: Add support for edit list parsing.

Fri Oct 21 21:08:30 EEST 2016

On 10/21/2016 6:47 PM, Sasi Inguva wrote:
>         * Audio packets. Especially audio packets with a large number of samples.
>           It's extremely likely that edits will not fall on packet boundaries, and
>           depending on the number of samples per packet, audio sync issues can and
>           will occur, even with smaller packets of e.g. 1024 samples, if there are
>           a large number of entries in the edit list. Gradual loss of sync will
>           occur.
> 
> We handle the case of the edit list start boundary not being exactly aligned with a packet boundary, by setting skip_samples fileld properly. But we don't handle the case yet, where edit list boundary might be not aligned with the packet boundary, in which case yes, one might get a gradual loss of sync. (Which is still better than the huge loss of sync one might get if edit lists are not parsed at all! )

FWIW, the gradual loss of sync is very noticeable on some 'popular' apps
that heavily use edit lists, such as "1 Second Everyday: Video Diary", if
I recall. Also some codecs have rather large packets.

>         * Edit list entries that are out of order, or repeat. This one is obvious;
>           simply dropping packets is not sufficient to create a virtual timeline
>           like edit lists can be used for by various NLEs. I need to look up
>           if these are actually allowed in the ISOBMFF of QT specs, but I know
>           Matroska allows it in its virtual timeline feature.
> 
> The currrent edit list code supports edits specifying repeated segments, overlapping segments etc.  We are not just dropping packets, but we are rebuilding the whole index. If there are two edits repeating a segmeng (a section of AVPackets lets call it ) , then those section of AVPackets get added twice in the AVIndex, and the decoder would decode the section twice (though I haven't tested such video )  . 
> 
> In fact this is the case where doing it after decode might not work, because something needs to go back and feed those packets again to the decoder.

I see. I had discounted that without looking, I admit, because the idea
of doing that, from a design standpoint, is just... gross, to me.

I definitely don't agree with this approach for a *demuxer*... I guess
this goes back to the conflation of presentation and demuxing layers.

This also requires knowledge of every single codec that can go into
ISOBMFF and what kind of decoding dependencies it has...

This whole hack seems very very complex to API users, and in general,
just so that edit lists can be handled on a layer different than what
they're meant to be handled on. It may be OK for ffmpeg.c because it's
'transparent', but it's pretty awful for API users.

>         * Returning timestamps that differ from the codec timestamps. I very
>           much disagree with this approach. It's the responsibility of the
>           presentation/render/transcode/app/whatever layer to adjust timestamps
>           based on edits. I absolutely disagree with libavformat changing
>           timestamps from what is coded in the actual file. avformat provides
>           demuxers.
> 
> We already used to apply a global offset to the timestamps of the video, set skip_samples field by parsing the first edit list entry etc. before the edit list patch http://git.videolan.org/?p=ffmpeg.git;a=blob;f=libavformat/mov.c;h=6e80b93271a4f998af6ba1af795d7d7c5d67f5a1;hb=7b3bc365f9923e30a925f8dece4fddd127a54c5d#l2792  . Not defending the approach, but just pointing  that what I did is just extending that approach to increased functionality.

"We used to have a different hack" isn't justification for introducing
a new, more breaking, hack, really.

> Can you send that file to me please. Thanks.

I have replied to you privately with the file.

Again, I realize I'm way too late to the thread here to make a real
difference, but I thought I should put it on the record.

Cheers,
- Derek