[FFmpeg-devel] [PATCH] WIP: subtitles in AVFrame

Fri Nov 11 16:36:04 EET 2016

Le primidi 21 brumaire, an CCXXV, wm4 a écrit :
> OK, let's think about alternative approaches. How about not trying to
> let libavfilter do synchronizing subtitles and other streams in all
> cases? Why not just send nothing to the subtitle buffer sources if the
> subtitles are sparse and there is no new packet? If it's sparse, make
> it sparse.

Sometimes a filter needs sync. Therefore the framework must be able to
provide it. Your suggestion does not allow that.

> I assume the whole point of this exercise is to prevent excessive
> buffering (e.g. not trying to read the next subtitle packet, which
> might read most of the file, and necessitate buffering it in memory).
> E.g. if you overlay video on subtitles, you'd normally require new
> frames for both subtitles and video. If you'd treat subtitles like
> video and audio, you'd have to try to read the next subtitle packet to
> know, well, whether there's a new subtitle or whether to use the
> previous one.

You understand the issue.

> If I understood this correctly, you want to send empty frames every
> 100ms if the duration of a subtitle is unknown. Why is it 100ms? Does
> it make a difference if it's 100ms or a few seconds (or until EOF)
> until the subtitle is naturally terminated? Why not 1ms? This seems
> like a very "approximate" solution - sure, it works, but it's akin to
> using polling and sleep calls in I/O or multithreadeded code.

That was just an example to help readers understand.

> Maybe a heartbeat on every video frame?

That is exactly the idea, you did not read the mail carefully enough
before judging it "bad idea".

>					  What if there's no video
> stream?

Then there is probably no need to sync.

> This related idea of slaving sbuffersrc to a master vbuffersrc. This
> approach works on video output,

I have no idea why you start talking about output.

>				  while sparseness is really a problem at
> the source (i.e. demuxer). It's questionable how this would work with
> subtitles demuxed from separate files

When the subtitles are demuxed from a file separate from the stream they
need to sync, the next frame is available on demand, and therefore the
heartbeat frames are not needed, nor generated.

>					(which also might have A/V
> streams).

If that happens, it seems like a very strange and pathological use case.
FFmpeg can not perform miracles.

>	    It also works on the video output, while the issue of
> subtitles with unknown duration is mostly a demuxing issue. What

Again, I do not know why you speak about output.

> happens if there's a video->sub filter, how would it send heartbeats?

I do not know, please tell me what it does exactly and I will be able to
answer you.

> Would it require a new libavfilter graph syntax for filters that
> generate subtitles within the graph, and would it require users to
> explicitly specify a "companion" video source?

Probably not.

> The whole problem is that it's hard to do determine whether a new
> subtitle frame should be available at a certain point inside
> libavfilter, which in turn is hard because with how generic libavfilter
> wants to be.

Exactly.

> It seems to me that not libavfilter should handle this, but the one
> which feeds libavfilter.

Its collaboration is mandatory. But the bulk of the work should be done
by the library, not duplicated in all applications.

>			   If it feeds a new video frame to it, but no
> subtitle frame, it means there is no new subtitle yet due to
> sparseness.

Libavfilter can not know when there is "no subtitle frame", only when
there is one. Hence the need to tell it explicitly:

>	      There is actually no need for weird heartbeat frames.

... with a heartbeat frame.

>								    The
> libavfilter API user is in the position to know whether the subtitle
> demuxer/decoder can produce a new packet/frame. It would be crazy if
                                                              ^^^^^
Disparaging judgement without grounds.

> the API user had to send heartbeat frames in these situations, and
> had to care about how many heartbeats are sent when.

Good thing it does not, then.

> In complex cases (where audio/video/subs are connected in a
> non-trivial way, possibly converting to each other at certain points),
> the user would have to be careful which buffersinks to read in order
> not to trigger excessive readahead. Also the user would possibly have to
> disable "automagic" synchronization mechanisms in other parts of
> libavfilter.
> 
> Even then, you would need to "filter" sparse frames (update their
> timestamps, produce new ones, etc). This sounds very complex.

Yes, it is complex. The problem is complex, the solution has little
chance of being simple.

> What about filtering subtitles alone? This should be possible?

Yes.

> Why would libavfilter in general be responsible to sync subtitles and
> video anyway? It should do that only on filters which have both
> subtitle and video inputs or so.

I do not understand why you are trying to make a distinction between
lavfi and individual filters. The sparseness issue is not about where
the code reside but about what information is available or not.

> Why does this need decoder enhancements anyway? How about it just uses
> the API in its current extent, which applications have handled for
> years? Again, special snowflake libavfilter/ffmpeg.c.

Mu.

> Btw. video frames can also be sparse (think of mp4s that contain slide
> shows). Are we going to get video heartbeat frames? How are all the
> filters going to handle it?

Currently they do not, but this is a separate issue.

> Even for not-sparse video, there seem to be cases (possibly fixed now)
> where libavfilter just excessively buffers when using ffmpeg.c. I'm
> still fighting such cases with my own libavfilter API use.
> (Interestingly, this often involves sparse video.)

The case with not-sparse video involve sparse video?

> (Oh, and I don't claim to have understood the problem in its whole
> extent. But I do have a lot of experience with subtitles and
> unfortunately also with using libavfilter in a "generic" way.)
> 
> > And I would really appreciate if in the future you refrained from that
> > kind of useless empty remark. You can raise practical concerns, ask for
> > explanations or rationales, of course. But a purely negative reply that
> > took you all of three minutes in answer to the result of years of design
> > is just incredibly rude.
> 
> I find your conduct incredibly rude as well. It's not nice to take
> every reply as an offense, instead of e.g. starting a discussion.
> It's also not nice to call my remarks "useless".
> 
> No, pointing out that a solution is sub-optimal is not rudeness.

Indeed. Next time, do that. Give arguments, ask question. Refrain from a
disparaging blanked judgement.

> Why are you asking me for a better solution?

Because you are the one who claimed the solution I propose is bad.
Unless the w in your nickname actually means Winston, claiming that a
solution is bad when the is no better one is just a waste of time.

>						You're the one who wants
> subtitles in libavfilter, not me.

You could have noticed that I am far from being the only one.

>				    Thus it's your responsibility to come
> up with a good design.

I did.

>			If there's no good design, then it's a sure sign
> that it's not a good idea to have subtitles in libavfilter. Indeed,

Or it is a sure sign that your aesthetics considerations are skewed.

> subtitles are incredibly complex, and there are many cases that
> somehow need to be handled in a generic way, and it's not necessarily a
> good idea to add this complexity to libavfilter, which designed to
> handle audio and video data (and even after years of work, isn't that
> good at handling audio and video at the same time). It's like trying to
> push a square peg through a round hole. I haven't come to the
> conclusion yet that this is the case, so hold that thought.
> 
> Please try not to reply to every paragraph separately or so. This makes
> it hard to follow discussions. In fact, I won't take part in a
> discussion of that kind. It wastes so much time because you can get lost
> in meaningless details.

This is exactly what I did because this is exactly what needed to be
done for that kind of discussion.

-- 
  Nicolas George
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20161111/64e26d63/attachment.sig>