[FFmpeg-devel] [PATCH 00/21] aarch64: hevc: Add missing hevc_pel NEON functions

Martin Storsjö martin at martin.st
Mon Mar 25 23:15:24 EET 2024


On Mon, 25 Mar 2024, Martin Storsjö wrote:

> Since some time, we have pretty complete AArch64 NEON coverage
> for the hevc decoder.
>
> However, some of these functions require the I8MM instruction set
> extension, and many of them (but not all) lack a plain NEON
> version.
>
> This patchset fills in a regular NEON version of all functions
> where we have an I8MM function.
>
> For context; the I8MM instruction set extension is a mandatory
> part of armv8.6-a. E.g. Apple M2, AWS Graviton 3 have it,
> but Apple M1 and Ampere Altra don't.
>
> This patchset takes decoding of a 1080p HEVC clip from 402
> fps to 649 fps on an Apple M1.
>
> Patch #2 also fixes a subtle bug in the existing implementation;
> two functions relied on the contents on the stack, below the
> stack pointer, being untouched within a function. If a signal
> gets delivered, those parts of the stack could be clobbered.

I know this is a bit short notice for a patchset of this size - but, would 
people be OK with merging this patchset before the impending 7.0 branch 
(which is made within the next 24h)?

The patches pass all my tricky build configurations, they give a very 
non-negligible speedup on many common CPUs, and patch #2 fixes a real bug 
in the existing impleemntations. (A bug fix patch can of course be 
backported after the branch too, but performance optimizations aren't 
generally relevant for backporting.)

// Martin


More information about the ffmpeg-devel mailing list