[FFmpeg-devel] [PATCH] lavc/pixblockdsp: specialise aligned 16-bit get_pixels
James Almer
jamrial at gmail.com
Thu Jul 25 21:25:11 EEST 2024
On 7/25/2024 1:50 PM, Rémi Denis-Courmont wrote:
> Le torstaina 25. heinäkuuta 2024, 19.16.21 EEST James Almer a écrit :
>> On 7/25/2024 12:53 PM, Rémi Denis-Courmont wrote:
>>> The current code assumes that we have unaligned rows, which hurts on
>>> platforms with slower unaligned accesses. (Also, this lets the compiler
>>> unroll manually, which it seems to do in practice.)
>>> ---
>>>
>>> libavcodec/pixblockdsp.c | 9 ++++++++-
>>> 1 file changed, 8 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/libavcodec/pixblockdsp.c b/libavcodec/pixblockdsp.c
>>> index bbbeca1618..1fff244511 100644
>>> --- a/libavcodec/pixblockdsp.c
>>> +++ b/libavcodec/pixblockdsp.c
>>> @@ -26,6 +26,13 @@
>>>
>>> static void get_pixels_16_c(int16_t *restrict block, const uint8_t
>>> *pixels,
>>>
>>> ptrdiff_t stride)
>>
>> Is there a way to hint the compiler that block is 16 byte aligned? GCC
>> 14 at least emits unaligned loads and stores for these.
>
> We don't have uint128_t, so the best we could do is cast to uint64_t *. Though
> GCC 13 emits 64-bit loads and stores on RV64 here with the given code. Is this
> maybe a problem with the COPY128 macro definition on x86?
AV_COPY128 with GCC x86 uses aligned load intrinsics, but at least GCC
14 emits movdqu instructions here for some reason.
More information about the ffmpeg-devel
mailing list