[FFmpeg-devel] [RFC] Swscale refactor progress and feedback

Fri Sep 27 17:49:58 EEST 2024

Hi all,

After a bit of a hiatus due to delays in negotioting the appropriate
contracts, I've finally been able to resume work on the swscale refactor
and have my current draft to demonstrate and gather critique on.

Rather than the initial goal of introducing a new AVScale header, I have
updated my proposal to instead directly reuse the sws_* namespace. For now,
this unfortunately requires postfixing some colliding functions with a `2`
suffix, e.g. sws_alloc_context2, sws_scale_frame2 and so on. I still want
to go back and try reusing the SwsContext type directly, though, now that
the SwsContext2 type has been more or less settled.

The current working draft is available at the following branch:
https://github.com/haasn/FFmpeg/commits/swscale2/

In particular, the commit introducing the new implementation is:
https://github.com/haasn/FFmpeg/commit/d90ceb1dd75939046a70cc7430546a26455a9ba0

I recommend reading that commit message for a more detailed overview of the
new implementation approach. To roughly summarize, however, the operations
graph (internal) API is based on the same principles as I use inside the
`libplacebo` project, except translated to a CPU implementation.

In terms of the user facing public API, the major departure, apart from the
stateless design, is the lack of any "partial" / sliced API. When thinking
about what such an API would imply on the implementation, I decided that it's
not generally straightforward to translate a partial scaling request back
upstream through the operations graph, especially with GPU implementations
in mind. As such, I decided to omit this call for now, and instead focus
on getting the core working first.

The next step is to implement an example of a simplified operation graph
for a limited subset of formats, so we can use it as a proof of concept and
benchmark it against current swscale.

Ultimately, we will have to rewrite some of the swscale functions; most likely
the ones dealing with input and output format conversions. Rather than a whole
bunch of special cases, the new implementation will most likely consist of
a few core functions dedicated to isolated tasks such as unpacking input,
applying a 3x3 matrix or LUT, or dithering. A modular design like this will
allow us to cover significantly more use cases with less code.

The hope is that the approach of having dedicated `post operation` kernels
will sufficiently mitigate the overhead of not merging functions. If
absolutely necessary, we could reintroduce merged functions only where
needed, while relying on the general purpose kernels for the majority of
lesser used cases.

So, all that being said, here are the biggest pain points I want feedback on:

1. How do we resolve the abiguity between SwsContext and SwsContext2?

 A) The current approach of using SwsContext2 and sws_scale_frame2
 B) Find a new name for SwsContext2 (e.g. AVScale)
 C) Try to shoehorn the new implementation back into SwsContext,
    rely on a lot of `if (!sws->is_v2) error;` checks, and make sure users
    don't mix the old and new functions on the same context?
 D) Move SwsContext to a new header, mutually exclusive with swscale.h, and
    otherwise reuse the same names?
 E) Delay merging the new implementation until we have 100% feature parity
    with the old one, and then just replace it entirely? (plus deprecating
    now any functions we intend to drop)
 F) Something else?

2. How detailed / accurate do we want to preserve back compat with "legacy"
   swscale semantics? For example, swscale currently has some obscure flags
   and modes that I don't see as a high priority to maintain support for. But
   if we want the new API to be a strict improvement, we ought to maintain
   backwards compatibility in some form for all of these obscure modes.
   OTOH, now might be our biggest chance to revise what is actually needed.

   For example, things I currently omit / imply, or could:
    - SWS_FULL_CHR_H_INP: I added a new flag SWS_FLAG_ALIAS which roughly
        corresponds to similar semantics, but in a more generalized fashion.
    - SWS_FULL_CHR_H_INT: turned on by default now, unless the user requests
        chroma point sampling. (Thought this does trigger a slow path in
        the underlying swscale legacy implementation for bgr8 etc)
    - Some of the more obscure scaling and dithering modes, such as
        "experimental", arithmatic dither, area scaling, etc.
    - Support for setting custom yuv matrices or filter weights.
    - Customizing scaler parameters: is this needed? If following the design
        of libplacebo, scalers should be made more tunable in general, rather
        than simply exposing an enum. I'm not sure I want to recreate these
        fields as is, at least for now.

Thanks for your time,
Niklas Haas