[FFmpeg-devel] Plans for libavfilter

Thu Sep 16 14:55:17 EEST 2021

Since it is not healthy to keep everything for myself, here is a summary
of the projects I have in mind for the enhancement of libavfilter.

The thing is, there are a lot of dependencies between these projects.
Working on them in a proper order would make achieving the goals easier.
On the other hand, if task B depends on task A, then working B without
addressing A involves a little more work for B, but also a lot more work
for A, and a lot more work for everything else that depends on A.

This is one reason I am writing this mail: to make clear what needs to
be done, and how tasks depend on each-other and fit together.

The earliest tasks are not very sexy. They are about making the code
clearer or more robust and generic rather than adding features. It is
always boring. But they are necessary, because parts of the process is
quite shaky, and we cannot afford to build on shaky code.

If you want to help, please consider this and you will be welcome.

Now that is established the specifics:

- Multi-stage negotiation.

  That means negotiating properties that depend on other properties. I
  think the colorspace stuff is in that case, it cannot be negotiated
  until the pixel format is known.

- Media type negotiation.

  That means that filters that do not care about the contents of what
  they are filtering, filters that only act on the frame information
  fields, can also not care if they are filtering audio or video or
  other. There are a lot of these filters: pts manipulation, concat,
  split, selection, etc.

  We cannot afford to have to update every one of them if we want to add
  a new media type, so this is a prerequisite. And it is an aspect of
  multi-stage negotiation, since pixel format or sample format can only
  be negotiated once we know the media type.

- Partial graph configuration.

  Right now, the negotiation happens in steps, first query_formats then
  merge, then pick. But pick can be done on some links as soon as the
  formats are merged to a singleton while other filters are still
  delaying their query_formats, and once pick is done, the graph can
  start running. That would allow filters with constraints more complex
  than what AVFilterFormats can express. (We have a mechanism for that,
  used by amerge and one or two other filters; I introduced it early and
  it was a mistake, it is much too fragile and only converges in the
  simplest cases.)

- Partial graph re-configuration.

  If we can run a graph when not all filters are configured, then we can
  de-configure and re-configure part of a graph. That would bring us the
  support for changes in pixel format.

- Global running API.

  Right now, we make libavfilter activate filters by pumping
  request_frame() on one of the buffersinks, more or less randomly, and
  we hope it will result in something useful. We need a better API, one
  where we say libavfilter "start running these graphs", we push frames
  on input as we have them, we are notified when frames arrive on
  output.

- Inter-filter threading.

  This is a big one, and quite self-explanatory. It goes hand to hand
  with a global running API, because "start running these graphs" also
  means "start as many threads as you need and keep them ready".

  Note that the network protocols in libavformat requires the same
  thing. And I really do not want yet two other frigging threading
  subsystems. This is why I have started slowly working on an unique
  system suited for all needs, starting with both libavformat and
  libavfilter but keeping also libavcodec in mind.

- Out-of-band side-data.

  This is a mechanism to notify filters that the work they are currently
  doing may become irrelevant, or other kinds of urgent information.
  Filters can ignore it, and just do the work for nothing. But smarter
  filters can take it into account and just skip until it becomes
  relevant again.

- Seeking.

  This is when applications notify the outputs they want to jump at a
  specified time, and all the filters in the graph make sure the
  instruction reach the inputs.

  This is probably rather easy and does not depend on many other things.
  Out-of-band side-data is meant for that, among other things: notify a
  filter "you can skip processing the frames you have queued, because we
  are seeking anyway"; and if seeking fails, notify the user as fast as
  possible "we tried seeking, it will not happen".

  A few things need ironing out, though. Do we want to handle it in
  activate() or do we want a separate callback? Do we want a specific
  mechanism for seeking or something more generic for all the messages
  that go backwards in the graph? Do we want a queue for messages that
  go backwards or would a single message be enough?

- Subtitles.

  This one has been under the spotlight recently. What it means is
  rather obvious. But it if far from trivial.

  For starters, it requires the negotiation of media type, because we
  cannot afford to update all the utility filters for subtitles, they
  have to work transparently.

  Then we have the issue that subtitles streams are sparse. A filter to
  render a subtitle on a video frame needs to know if the next subtitle
  is before or after the current frame, but if the subtitles are muxed
  with the video, then the next subtitle can arrive only in several
  minutes. The solution I have for that is heartbeat frames: subtitles
  frames that contain no information except for their timestamp and mean
  that no change has occurred. To generate them, link the buffersink for
  the subtitles to the buffersink for the video.

  There are other issues to consider when designing subtitles: pixel
  format and automatic conversions, overlapping, global styles, etc.

- Data packets.

  This is about having packets of binary data inside libavfilter. That
  means allowing to use bitstream filters like real filters in a graph
  instead of the second, more limited, API just for them. It also means
  codecs can be data → audio/video or audio/video → data filters.

  It requires the negotiation of media type.

  It also requires thinking how to fit the information present in
  AVPacket into AVFrame. Some timestamps are not available, for example.

I think that is all I have in mind for now.

Regards,

-- 
  Nicolas George
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20210916/3dc6bde0/attachment.sig>