[FFmpeg-devel] [PATCH 3/4] huffyuvencdsp: Add ff_diff_bytes_sse2
James Almer
jamrial at gmail.com
Mon Oct 19 22:37:51 CEST 2015
On 10/19/2015 5:00 PM, Timothy Gu wrote:
> 4% to 35% faster depending on the width.
> ---
> libavcodec/x86/huffyuvencdsp.asm | 31 ++++++++++++++++++++-----------
> libavcodec/x86/huffyuvencdsp_mmx.c | 8 +++++++-
> 2 files changed, 27 insertions(+), 12 deletions(-)
>
> diff --git a/libavcodec/x86/huffyuvencdsp.asm b/libavcodec/x86/huffyuvencdsp.asm
> index 97de7e9..9625fbe 100644
> --- a/libavcodec/x86/huffyuvencdsp.asm
> +++ b/libavcodec/x86/huffyuvencdsp.asm
> @@ -27,27 +27,27 @@
>
> section .text
>
> -INIT_MMX mmx
> ; void ff_diff_bytes_mmx(uint8_t *dst, const uint8_t *src1, const uint8_t *src2,
> ; intptr_t w);
> -cglobal diff_bytes, 4,6,0, dst, src1, src2, w, i
> +%macro DIFF_BYTES 0
> +cglobal diff_bytes, 4,6,2, dst, src1, src2, w, i
> xor iq, iq
> - cmp wq, 16
> + cmp wq, mmsize * 2
> jb .loop2
> - sub wq, 15
> + sub wq, mmsize * 2 - 1
> .loop:
> - mova m0, [src2q + iq]
> - mova m1, [src1q + iq]
> + movu m0, [src2q + iq]
> + movu m1, [src1q + iq]
If dst and/or src can sometimes be aligned, check how ff_add_hfyu_left_pred
(also huffyuvdsp.asm) handles it.
More information about the ffmpeg-devel
mailing list