[FFmpeg-devel] libavfilter API design in a realtime environment

Fri Apr 22 11:55:35 CEST 2016

On 4/19/16, Nicolas George <george at nsup.org> wrote:
> Le septidi 27 ventose, an CCXXIV, Kieran Kunhya a ecrit :
>> I want to try and use the libavfilter API to overlay bitmap subtitles on
>> video from a realtime source. This seems difficult/impossible to do with
>> the current API hence asking on the main devel list.
>
> Have you looked at what the command-line tool ffmpeg does? It is not
> optimized for ultra-low latency, but it should already achieve reasonable
> results.
>
>> 1: How do I know the end to end latency of the pipeline? Is it fixed,
>> does
>> it vary? This matters because my wallclock PTS needs addition of this
>> latency.
>
> You can not know that in the general case, since the latency of some
> filters
> depends on the frame contents and arbitrary user-provided formulas.
>
> On a particular case, the rule of thumb is that filters produce output as
> soon as they have enough input information to do so. But note that for
> filters that require syncing between several video streams will likely
> require one extra frame on some or all stream. This happens because frames
> have no duration (and I am convinced they should not have one), and
> therefore the next frame is required to know the end timestamp.

This is really, really bad. Frame do should have duration. And audio
ones do have.

>
>> 2: Do I need to interleave video and subtitles (e.g VSVSVSVS) in
>> monotonically increasing order? What happens if the subtitles stop for a
>> bit (magic queues are bad in a realtime environment)? My timestamps are
>> guaranteed to be the same though.
>
> libavfilter can deal with streams slightly out of sync by buffering, but it
> takes a lot of memory of course, and will eventually lead to OOM or
> dropping
> frames if the desync is too large.
>
>> 3: My world is CFR but libavfilter is VFR - how does the API know when to
>> start releasing frames? Does this add one frame of video latency then
>> until
>> it waits for the next video frame to arrive?
>
> Knowing you have CFR gives you an assumption at the frame duration, and
> therefore can save you the wait for the next frame. The difficulty is
> integrating that elegantly in the scheduling. See below.
>
>> 4: What are the differences between the FFmpeg and libav implementations?
>> FFmpeg uses a framesync and libav doesn't?
>
> If your graph has a single output and you never have a choice about which
> input to feed (for example because the frames arrive interleaved), then I
> believe the difference do not matter for you.
>
>> 5: I know exactly which frames have associated subtitle bitmaps or not,
>> is
>> there a way I can overlay without an extra frame delay?
>
> As wm4 explained, the hard part with subtitles is that they are sparse: you
> have a start event, then ~2 seconds, or ~50 frames worth of video, then an
> end event, and then maybe several hours before the next start event. If any
> filter requires an end timestamp and syncs with video, then you have a huge
> latency and a huge buffer.
>
> If the subtitles come from a separate on-demand file, there is no problem,
> since the next event is available whenever necessary, and the scheduling
> (at
> least in the FFmpeg version) will tell you when it is.
>
> On the other hand, if your subtitles events are not available on demand,
> either because they are interleaved with the video in a muxed format or
> because they arrive in real time, it does not work.
>
> You need assumptions about the timestamps properties of your streams. For
> example, video players reading index-less streams will assume that
> subtitles
> are not muxed too much after the video, or they will be ignored.
>
> To take that into account for the support of bitmap subtitles in ffmpeg, I
> used heartbeat frames: whenever a frame is decoded and injected on a
> non-subtitle input, on all subtitle inputs connected to the same input
> stream the current (possibly empty) frame is duplicated and injected.
>
> When proper subtitle support will be implemented in lavfi, I suppose a
> similar solution should be adopted. The application will need to let lavfi
> know what streams are interleaved together, probably with a filter with N+n
> inputs and as many outputs. But in that case, the heartbeat frames can be
> really dummy frames, they may need no duplication.
>
> The same trick can be used to inform lavfi about CFR, although it feels
> really hacky: when you have a subtitle frame at pts=5, duplicate it to
> pts=5.999999, it should allow processing the video frame at pts=5.
>
> Another solution for CFR would be to add some kind of "min_frame_duration"
> option to the framesync utility.
>
> Hope this helps.
>
> Regards,
>
> --
>   Nicolas George
>