[FFmpeg-devel] [PATCH] swscale/aarch64: dotprod implementation of rgba32_to_Y
Andreas Rheinhardt
andreas.rheinhardt at outlook.com
Fri Feb 28 12:49:53 EET 2025
Niklas Haas:
> On Fri, 28 Feb 2025 10:31:19 +0800 Zhao Zhili <quinkblack at foxmail.com> wrote:
>> Cc haasn.
>>
>> Libswscale in under refactor. Does current asm works after refactor, or they need to be refactored or
>> rewrite after? If it’s the second case, maybe we should hold on to do more asm with libswscale
>> before hassn work done.
>
> No, almost all current asm will be unused after the rewrite. There are some we
> can in theory reuse, but for the most part, it doesn't seem to be worth it.
>
> Especially for the very bespoke functions like this one.
>
> For context, in general, the focus in nu-swscale is to focus more on smaller,
> flexible primitives and have the calling code combine them as needed. So instead
> of a "brga_to_y" function, you would have a sequence that looks like this:
>
> Operation list:
> [ u8 XXXX -> dddX] SWS_OP_READ : 4 elem(s) packed >> 0
> [ u8 ...X -> dddX] SWS_OP_SWIZZLE : 2103
> [ u8 ...X -> dddX] SWS_OP_CONVERT : u8 -> f32
> [f32 ...X -> .XXX] SWS_OP_LINEAR : dot3 [[0.299000 0.587000 0.114000 0 0] [0 1 0 0 0] [0 0 1 0 0] [0 0 0 1 0]]
> [f32 .XXX -> .XXX] SWS_OP_DITHER : 16x16 {255 _ _ _}
> [f32 .XXX -> dXXX] SWS_OP_CONVERT : f32 -> u8
> [ u8 .XXX -> XXXX] SWS_OP_WRITE : 1 elem(s) packed >> 0
>
> Where each low-level implementation can combine one, or multiple, such
> operations together. For example, in the current prototype, SWS_OP_CONVERT and
> SWS_OP_WRITE can be fused together into a single implementation.
>
> Note also the conversion to float. I found that the cost of going through
> floats seems to be lower on average, across all tested platforms, than the
> extra cost of dealing with integers (which require extra shifting, extra
> dithering, and extra width conversions - all of which exceed the cost of just
> one extra float->int conversion step). This also comes with improved accuracy.
>
But what about bitexactness?
- Andreas
More information about the ffmpeg-devel
mailing list