[FFmpeg-devel] [PATCH 5/5] avfilter/vf_yadif: Add x86_64 avx yadif asm

Michael Niedermayer michael at niedermayer.cc
Wed Jul 20 16:16:50 EEST 2022


On Tue, Jul 19, 2022 at 09:41:17PM -0700, Chris Phlipot wrote:
> Add a new version of yadif_filter_line performed using packed bytes
> instead of the packed words used by the current implementaiton. As
> a result this implementation runs almost 2x as fast as the current
> fastest SSSE3 implementation.
> 
> This implementation is created from scratch based on the C code, with
> the goal of keeping all intermediate values within 8-bits so that
> the vectorized code can be computed using packed bytes. differences
> are as follows:
> - Use algorithms to compute avg and abs difference using only 8-bit
>  intermediate values.
> - Reworked the mode 1 code by applying various mathematical identities
>  to keep all intermediate values within 8-bits.
> - Attempt to compute the spatial score using only 8-bits. The actual
>  spatial score fits within this range 97% (content dependent) of the
>  time for the entire 128-bit xmm vector. In the case that spatial
>  score needs more than 8-bits to be represented, we detect this case,
>  and recompute the spatial score using 16-bit packed words instead.
> 
> In 3% of cases the spatial_score will need more than 8-bytes to store
> so we have a slow path, where the spatial score is computed using
> packed words instead.
> 
> This implementation is currently limited to x86_64 due to the number
> of registers required. x86_32 is possible, but the performance benefit
> over the existing SSSE3 implentation is not as great, due to all of the
> stack spills that would result from having far fewer registers. ASM was
> not generated for the 32-bit varient due to limited ROI, as most AVX
> users are likely on 64-bit OS at this point and 32-bit users would
> lose out on most of the performance benefit.
> 
> Signed-off-by: Chris Phlipot <cphlipot0 at gmail.com>

theres no need to support 32it but ffmpeg build must not break
on linux x86-32

src/libavfilter/x86/vf_yadif_x64.asm:145: error: impossible combination of address sizes
src/libavfilter/x86/vf_yadif_x64.asm:145: error: invalid effective address
src/libavfilter/x86/vf_yadif_x64.asm:146: error: impossible combination of address sizes
src//libavutil/x86/x86inc.asm:1399: ... from macro `movdqu' defined here
src//libavutil/x86/x86inc.asm:1264: ... from macro `RUN_AVX_INSTR' defined here
src//libavutil/x86/x86inc.asm:1717: ... from macro `vmovdqu' defined here


[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Everything should be made as simple as possible, but not simpler.
-- Albert Einstein
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 195 bytes
Desc: not available
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20220720/3b05ed9d/attachment.sig>


More information about the ffmpeg-devel mailing list