[FFmpeg-devel] [PATCH 4/4] avcodec/h264: sse2, avx h luma mbaff deblock/loop filter
James Almer
jamrial at gmail.com
Wed Feb 15 18:55:51 EET 2017
On 2/13/2017 9:44 AM, James Darnley wrote:
> x86-64 only
>
> Yorkfield:
> - sse2: 2.16x (434 vs. 201 cycles)
>
> Skylake:
> - sse2: 3.04x (378 vs. 124 cycles)
> - avx: 3.29x (378 vs. 115 cycles)
> ---
> libavcodec/x86/h264_deblock.asm | 119 ++++++++++++++++++++++++++++++++++++++++
> libavcodec/x86/h264dsp_init.c | 10 ++++
> 2 files changed, 129 insertions(+)
>
> diff --git a/libavcodec/x86/h264_deblock.asm b/libavcodec/x86/h264_deblock.asm
> index 509a0dbe0c..f47a199e8f 100644
> --- a/libavcodec/x86/h264_deblock.asm
> +++ b/libavcodec/x86/h264_deblock.asm
> @@ -377,10 +377,129 @@ cglobal deblock_h_luma_8, 5,9,0,0x60+16*WIN64
> RET
> %endmacro
>
> +; TODO: use macro arguments
> +%macro TRANSPOSE_8X8B_XMM 8
Why not put this in x86util? And using arguments, of course.
Also, just call it TRANSPOSE_8X8B.
> + punpcklbw m0, m1
> + punpcklbw m2, m3
> + punpcklbw m4, m5
> + punpcklbw m6, m7
> +
> + punpckhwd m1, m0, m2
> + punpcklwd m0, m2
Use SBUTTERFLY here and below.
> +
> + punpckhwd m5, m4, m6
> + punpcklwd m4, m6
> +
> + punpckhdq m2, m0, m4
> + punpckldq m0, m4
> +
> + punpckhdq m6, m1, m5
> + punpckldq m1, m5
> +
> + MOVHL m4, m0
> + MOVHL m3, m2
> + MOVHL m7, m6
> + MOVHL m5, m1
> + SWAP 1, 4
> +%endmacro
> +
> +%macro TRANSPOSE_8X8B_XMM 0
> + TRANSPOSE_8X8B_XMM 0, 1, 2, 3, 4, 5, 6, 7
This seems wrong, or at least superfluous.
> +%endmacro
> +
> +%macro DEBLOCK_H_LUMA_MBAFF 0
> +
> +cglobal deblock_h_luma_mbaff_8, 5, 9, 10, 8*16, pix_, stride_, alpha_, beta_, tc0_
Why the underscores?
> + movsxd stride_q, stride_d
> + dec alpha_d
> + dec beta_d
> + mov r5, pix_q
> + lea r6, [3*stride_q]
Call r6 stride3.
More information about the ffmpeg-devel
mailing list