[FFmpeg-devel] Evolution of lavfi's design and API

Wed Oct 22 23:45:42 CEST 2014

[ CCing Anton, as most that is written here also apply to libav too, and
this would be a good occasion to try a cross-fork cooperation; if that is
not wanted, please let us know so we can drop the cc. ]

1. Problems with the current design

  1.1. Mixed input-/output-driven model

    Currently, lavfi is designed to work in a mixed input-driven and
    output-driven model. That means the application needs sometimes to add
    input to buffersources and sometimes request output to buffersinks. This
    is a bit of a nuisance, because it requires the application to do it
    properly: adding input on the wrong input or requesting a frame on the
    wrong output will cause extra memory consumption or latency.

    With the libav API, it can not work at all since there is no mechanism
    to determine which input needs a frame in order to proceed.

    The libav API is clearly designed for a more output-driven
    implementation, with FIFOs anywhere to prevent input-driven frames to
    reach unready filters. Unfortunately, since it is impossible from the
    outside to guess what output will get a frame next, that can cause
    frames to accumulate anywhere in the filter graph, eating a lot of
    memory unnecessarily.

    FFmpeg's API has eliminated FIFOs in favour of queues in filters that
    need it, but these queues can not be controlled for unusual filter
    graphs with extreme needs. Also, there still is an implicit FIFO inside
    buffersink.

  1.2. Recursive implementation

    All work in a filter graph is triggered by recursive invocations of the
    filters' methods. It makes debugging harder. It also can lead to large
    stack usage and makes frame- and filter-level multithreading harder to
    implement. It also prevents some diagnosis from working reliably.

  1.3. EOF handling

    Currently, EOF is propagated only through the return value of the
    request_frame() method. That means it only works in an output-driven
    scheme. It also means that it has no timestamp attached to it; this is
    an issue for filters where the duration of the last frame is relevant,
    like vf_fps.

  1.4. Latency

    Some filters need to know the timestamp of the next frame in order to
    know when the current frame will stop and be able to process it:
    overlay, fps are two examples. These filters will introduce a latency of
    one input frame that could otherwise be avoided.

  1.5. Timestamps

    Some filters do not care about timestamps at all. Some check and have a
    proper handling of NOPTS values. Some filters just assume the frames
    will have timestamps, and possibly make extra assumptions on that:
    monotony, consistency, etc. That is an inconsistent mess.

  1.6. Sparse streams

    There is a more severe instance of the latency issue when the input
    comes from an interleaved sparse stream: in that case, waiting for the
    next frame in order to find the end of the current one may require
    demuxing a large chunk of input, in turn provoking a lot of activity on
    other inputs of the graph.

2. Proposed API changes

  To fix/enhance all these issues, I believe a complete rethink of the
  scheduling design of the library is necessary. I propose the following
  changes.

  Note: some of these changes are not 100% related to the issues I raised,
  but looked like a good idea while thinking on an API rework.

  2.1. AVFrame.duration

    Add a duration field to AVFrame; if set, it indicates the duration of
    the frame. Thus, it becomes unnecessary to wait for the next frame to
    know when the current frame stops, reducing the latency.

    Another solution would be to add a dedicated function on buffersrc to
    inject a timestamp for end or activity on a link. That would avoid the
    need of adding a field to AVFrame.

  2.2. Add some fields to AVFilterLink

    AVFilterLink.pts: current timestamp of the link, i.e. end timestamp of
    the last forwarede frame, assuming the duration was correct. This is
    somewhat redundant with the fields in AVFrame, but can carry the
    information even when there is no actual frame.

    AVFilterLink.status: if not 0, gives the return status of trying to pass
    a frame on this link. The typical use would be EOF.

  2.3. AVFilterLink.need_ts

    Add a field to AVFilterLink to specify that the output filter requires
    reliable timestamps on its input. More precisely, specify how reliable
    the timestamps need to be: is the duration necessary? do the timestamps
    need to be monotonic? continuous?

    For audio streams, consistency between timestamps and the number of
    samples may also be tested. For video streams, constant frame rate may
    be enforced, but I am not sure about this one.

    A "fixpts" filter should be provided to allow the user to tweak how the
    timestamps are fixed (change the timestamps to match the duration or
    change the duration to match the timestamps?).

    When no explicit filter is inserted, the framework should do the work of
    fixing them automatically. I am not sure whether that should be done
    directly or by automatically inserting the fixpts filter. The later
    solution is more elegant, but it requires more changes to the framework
    and the filters (because the correctness of the timestamps would need to
    be merged just like formats), so I am rather for the former.

    Note that for a lot of filters, the actual duration or end timestamp is
    not required, only a lower bound for it. For sparse interleaved streams,
    that is very relevant as we may not know the exact time for the next
    frame until we reach it, but we can know it is later than the other
    streams' timestamps minus the interleaving delta.

  2.4. Build FIFOs directly in AVFilterLink

    Instead of automatically insert an additional filter like libav, handle
    the FIFO operation directly in the framework using fields in
    AVFilterLink.

    The main benefit is that the framework can examine the inside of the
    FIFOs to make scheduling decisions. It can also do so to provide the
    user with more accurate diagnostics.

    An extra benefit: the memory pool for the FIFOed frames can more easily
    be shared, across the whole filter graph or the whole application.
    Memory management becomes easier: just take a good heuristics (half the
    RAM?), no need to guess what FIFOs will actually need a lot of memory
    and what FIFOs are just there mostly useless.

    Last but not least, FIFOs now become potential thread communication /
    synchronization points, making filter-level multithreading easier.

    For audio streams, framing (i.e. ensuring all frame have an exact /
    minimum / maximum number of samples) can be merged with FIFOs.

  2.5. Allow status change pseudo-frames inside FIFOs

    To propagate EOF and possibly other status changes (errors) in
    input-driven model, allow FIFOs to contain not only frames but also kind
    of pseudo-frames with a timestamp and metadata attached.

    Depending on the filters, these pseudo-frames may be directly passed to
    the filter's methods, or they may be interpreted by the framework to
    just change fields on the AVFilterLink structure.

  2.6. Change the scheduling logic for filters

    From the outside of a filter with several outputs, it is usually not
    possible to guess what output will get a frame next. Requesting a frame
    on output #0 may cause activity on the filter graph that produce a frame
    on output #1 instead, or possibly on a completely different filter.

    Therefore, having a request_frame() method on all outputs seems
    pointless.

    Instead, use a global AVFilter.activate() method that causes the filter
    to do one step of work if it can. This method is called each time
    something is changed to the filter: new frame on input, output ready,
    status change. It returns as soon at it could do something, either
    producing output and/or consuming input, or nothing if nothing can be
    done.

  2.7. Add fields to AVFilterLink for flow control.

    Add to AVFilterLinks a few field to help filters decide if they need to
    process something, and if relevant in what order. The most obvious idea
    would be AVFilterLink.frames_needed, counting how many frames are
    probably needed on a link before anything can be done. For example, with
    concat, after input has been consumed, the frames_needed fields on the
    current input are set according to the corresponding output.

  2.8. Activate the filters iteratively

    Keep a global (per graph) priority queue of filters that are supposed to
    be ready and call the activate() method on them.

  2.9. AVFrame.stream_id

    Add an integer (or pointer: intptr_t maybe?) field to AVFrame to allow
    passing frames related to distinct streams on the same link. That would
    allow to multiplex all outputs of a graph into a single output, making
    the application simpler.

    Not sure this is really useful or necessary: for the graph outputs, a
    convenience function iterating on all of them and returning the frame
    and the output index separately would do the trick too.

  2.10. buffersrc.callback and buffersink.callback

    Add a callback on both buffersource and buffersink, called respectively
    when a frame is necessary on input and a frame has arrived on output.
    This allows pure input-driven and pure output-driven design to work.

  2.11. Links groups

    Links that carry frames from related interleaved streams should be
    explicitly connected together so that the framework can use the
    information.

    The typical use would be to group all the links from buffersrc that come
    from the same interleaved input file.

    When a frame is passed on a link, all links in the same group(s) that
    are too late (according to an interleaving tolerance that can be set)
    are activated using a dummy frame.

  2.12. FIFOs with compression and external storage

    All FIFOs should be able to off-load some of their memory requirements
    by either compressing the frames (using a lossless or optionally lossy
    codec) and/or storing them on mass storage.

    The options for that should be changeable globally or on a per-link
    basis.

  2.13. AVFrame.owner

    Add a owner field (probably with type "AVObject", i.e. "void *" pointing
    to AVClass *) to AVFrame, and update it whenever the frame is passed
    from one filter to the other. That way, inconsistent ref/unref
    operations can be detected.

Regards,

-- 
  Nicolas George
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: Digital signature
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20141022/6e3fc728/attachment.asc>