[FFmpeg-devel] [PATCH] WIP: subtitles in AVFrame

Fri Nov 11 16:06:21 EET 2016

On Fri, 11 Nov 2016 13:45:54 +0100
Nicolas George <george at nsup.org> wrote:

> Le primidi 21 brumaire, an CCXXV, wm4 a écrit :
> > No, but I'm confident that you can find one after having thought about
> > the problem for so long.  
> 
> Well, I have thought about it for long, and not only is this the
> solution I have found, but I do not think it is a "bad idea" at all.

OK, let's think about alternative approaches. How about not trying to
let libavfilter do synchronizing subtitles and other streams in all
cases? Why not just send nothing to the subtitle buffer sources if the
subtitles are sparse and there is no new packet? If it's sparse, make
it sparse.

I assume the whole point of this exercise is to prevent excessive
buffering (e.g. not trying to read the next subtitle packet, which
might read most of the file, and necessitate buffering it in memory).
E.g. if you overlay video on subtitles, you'd normally require new
frames for both subtitles and video. If you'd treat subtitles like
video and audio, you'd have to try to read the next subtitle packet to
know, well, whether there's a new subtitle or whether to use the
previous one.

If I understood this correctly, you want to send empty frames every
100ms if the duration of a subtitle is unknown. Why is it 100ms? Does
it make a difference if it's 100ms or a few seconds (or until EOF)
until the subtitle is naturally terminated? Why not 1ms? This seems
like a very "approximate" solution - sure, it works, but it's akin to
using polling and sleep calls in I/O or multithreadeded code.

Maybe a heartbeat on every video frame? What if there's no video
stream? Video can change timestamps and frame count, video could go
sparser un-sparse at certain points.

This related idea of slaving sbuffersrc to a master vbuffersrc. This
approach works on video output, while sparseness is really a problem at
the source (i.e. demuxer). It's questionable how this would work with
subtitles demuxed from separate files (which also might have A/V
streams). It also works on the video output, while the issue of
subtitles with unknown duration is mostly a demuxing issue. What
happens if there's a video->sub filter, how would it send heartbeats?
Would it require a new libavfilter graph syntax for filters that
generate subtitles within the graph, and would it require users to
explicitly specify a "companion" video source?

The whole problem is that it's hard to do determine whether a new
subtitle frame should be available at a certain point inside
libavfilter, which in turn is hard because with how generic libavfilter
wants to be.

It seems to me that not libavfilter should handle this, but the one
which feeds libavfilter. If it feeds a new video frame to it, but no
subtitle frame, it means there is no new subtitle yet due to
sparseness. There is actually no need for weird heartbeat frames. The
libavfilter API user is in the position to know whether the subtitle
demuxer/decoder can produce a new packet/frame. It would be crazy if
the API user had to send heartbeat frames in these situations, and
had to care about how many heartbeats are sent when.

In complex cases (where audio/video/subs are connected in a
non-trivial way, possibly converting to each other at certain points),
the user would have to be careful which buffersinks to read in order
not to trigger excessive readahead. Also the user would possibly have to
disable "automagic" synchronization mechanisms in other parts of
libavfilter.

Even then, you would need to "filter" sparse frames (update their
timestamps, produce new ones, etc). This sounds very complex.

What about filtering subtitles alone? This should be possible?

Why would libavfilter in general be responsible to sync subtitles and
video anyway? It should do that only on filters which have both
subtitle and video inputs or so.

Why does this need decoder enhancements anyway? How about it just uses
the API in its current extent, which applications have handled for
years? Again, special snowflake libavfilter/ffmpeg.c.

Btw. video frames can also be sparse (think of mp4s that contain slide
shows). Are we going to get video heartbeat frames? How are all the
filters going to handle it?

Even for not-sparse video, there seem to be cases (possibly fixed now)
where libavfilter just excessively buffers when using ffmpeg.c. I'm
still fighting such cases with my own libavfilter API use.
(Interestingly, this often involves sparse video.)

(Oh, and I don't claim to have understood the problem in its whole
extent. But I do have a lot of experience with subtitles and
unfortunately also with using libavfilter in a "generic" way.)

> And I would really appreciate if in the future you refrained from that
> kind of useless empty remark. You can raise practical concerns, ask for
> explanations or rationales, of course. But a purely negative reply that
> took you all of three minutes in answer to the result of years of design
> is just incredibly rude.

I find your conduct incredibly rude as well. It's not nice to take
every reply as an offense, instead of e.g. starting a discussion.
It's also not nice to call my remarks "useless".

No, pointing out that a solution is sub-optimal is not rudeness.

Why are you asking me for a better solution? You're the one who wants
subtitles in libavfilter, not me. Thus it's your responsibility to come
up with a good design. If there's no good design, then it's a sure sign
that it's not a good idea to have subtitles in libavfilter. Indeed,
subtitles are incredibly complex, and there are many cases that
somehow need to be handled in a generic way, and it's not necessarily a
good idea to add this complexity to libavfilter, which designed to
handle audio and video data (and even after years of work, isn't that
good at handling audio and video at the same time). It's like trying to
push a square peg through a round hole. I haven't come to the
conclusion yet that this is the case, so hold that thought.

Please try not to reply to every paragraph separately or so. This makes
it hard to follow discussions. In fact, I won't take part in a
discussion of that kind. It wastes so much time because you can get lost
in meaningless details.