[FFmpeg-devel] [PATCH] Moves yuv2yuvX_sse3 to yasm, unrolls main loop and other small optimizations for ~20% speedup.

Josh Dekker josh at itanimul.li
Thu Dec 10 15:28:56 EET 2020


On 2020/12/09 11:19, Alan Kelly wrote:
> ---
>   Activates avx2 version of yuv2yuvX
>   Adds checkasm for yuv2yuvX
>   Modifies ff_yuv2yuvX_* signature to match yuv2yuvX_*
>   Replaces non-temporal stores with temporal stores
>   libswscale/x86/Makefile     |   1 +
>   libswscale/x86/swscale.c    | 106 +++++++++-----------------------
>   libswscale/x86/yuv2yuvX.asm | 118 ++++++++++++++++++++++++++++++++++++
>   tests/checkasm/sw_scale.c   | 101 +++++++++++++++++++++++++++++-
>   4 files changed, 249 insertions(+), 77 deletions(-)
>   create mode 100644 libswscale/x86/yuv2yuvX.asm
> 
> [...]
> diff --git a/tests/checkasm/sw_scale.c b/tests/checkasm/sw_scale.c
> index 9efa2b4def..7009169361 100644
> --- a/tests/checkasm/sw_scale.c
> +++ b/tests/checkasm/sw_scale.c
>
> [...]
>
> +static void check_yuv2yuvX(void)
> +{
> +    struct SwsContext *ctx;
> +    int fsi, osi;
> +#define LARGEST_FILTER 8
> +#define FILTER_SIZES 4
> +    static const int filter_sizes[FILTER_SIZES] = {1, 4, 8, 16};
> +
> +    declare_func_emms(AV_CPU_FLAG_MMX, void, const int16_t *filter,
> +                      int filterSize, const int16_t **src, uint8_t *dest,
> +                      int dstW, const uint8_t *dither, int offset);
> +
> +    int dstW = SRC_PIXELS;
> +    const int16_t **src;
> +    LOCAL_ALIGNED_32(int16_t, filter_coeff, [LARGEST_FILTER]);
> +    LOCAL_ALIGNED_32(uint8_t, dst0, [SRC_PIXELS]);
> +    LOCAL_ALIGNED_32(uint8_t, dst1, [SRC_PIXELS]);
> +    LOCAL_ALIGNED_32(uint8_t, dither, [SRC_PIXELS]);
> +    union VFilterData{
> +        const int16_t *src;
> +        uint16_t coeff[8];
> +    } *vFilterData;
> +    uint8_t d_val = rnd();
> +    randomize_buffers(filter_coeff, LARGEST_FILTER);
> +    ctx = sws_alloc_context();
> +    if (sws_init_context(ctx, NULL, NULL) < 0)
> +        fail();
> +
> +    ff_sws_init_swscale_x86(ctx);
This should be ff_getSwsFunc() instead.
> +    for(int i = 0; i < SRC_PIXELS; ++i){
> +        dither[i] = d_val;
> +    }
> [...]
-- 
Josh


More information about the ffmpeg-devel mailing list