[FFmpeg-devel] [PATCH v2 00/15] avfilter/vf_bwdif: Add aarch64 neon functions
Martin Storsjö
martin at martin.st
Mon Jul 3 00:09:52 EEST 2023
On Sun, 2 Jul 2023, John Cox wrote:
> Also adds a filter_line3 method which on aarch64 neon yields approx 30%
> speedup over 2xfilter_line and a memcpy
>
> Differences from v1:
> .align 16 corrected to .balign 16
> SXTW tolower
> Mac ABI (hopefully) fixed
> V register pop/push macroed & prettified
>
> John Cox (15):
> avfilter/vf_bwdif: Add outline for aarch neon functions
> avfilter/vf_bwdif: Add common macros and consts for aarch64 neon
> avfilter/vf_bwdif: Export C filter_intra
> avfilter/vf_bwdif: Add neon for filter_intra
> tests/checkasm: Add test for vf_bwdif filter_intra
> avfilter/vf_bwdif: Add clip and spatial macros for aarch64 neon
> avfilter/vf_bwdif: Export C filter_edge
> avfilter/vf_bwdif: Add neon for filter_edge
> tests/checkasm: Add test for vf_bwdif filter_edge
> avfilter/vf_bwdif: Export C filter_line
> avfilter/vf_bwdif: Add neon for filter_line
> avfilter/vf_bwdif: Add a filter_line3 method for optimisation
> avfilter/vf_bwdif: Add neon for filter_line3
> tests/checkasm: Add test for vf_bwdif filter_line3
> avfilter/vf_bwdif: Block filter slices into a multiple of 4 lines
Overall, I'd suggest squashing/reordering the patches like this:
- tests/checkasm: Add test for vf_bwdif filter_intra
- avfilter/vf_bwdif: Add neon for filter_intra
(With the preceding patches squashed. For extra common macros, only add
the ones you use in this patch here.)
- tests/checkasm: Add test for vf_bwdif filter_edge
- avfilter/vf_bwdif: Add neon for filter_edge (with other dependencies
squashed)
- avfilter/vf_bwdif: Add neon for filter_line
- avfilter/vf_bwdif: Add a filter_line3 method for optimisation
+ checkasm test squashed
- avfilter/vf_bwdif: Add neon for filter_line3
// Martin
More information about the ffmpeg-devel
mailing list