[FFmpeg-devel] [PATCH 5/6] x86/hevcdsp: add ff_hevc_sao_edge_filter_8_{ssse3, avx2}

Christophe Gisquet christophe.gisquet at gmail.com
Wed Feb 4 13:39:21 CET 2015


Hi,

2015-02-04 4:55 GMT+01:00 James Almer <jamrial at gmail.com>:
> Original x86 intrinsics code and initial yasm port by Pierre-Edouard Lepere.
> Refactoring and optimizations by James Almer.

Add your own copyright to this file then.

> Width 32
> 158583 decicycles in edge, sao_edge_filter_8 runs, 0 skips
> 5205 decicycles in ff_hevc_sao_edge_filter_32_8_ssse3, 32767 runs, 1 skips
> 2942 decicycles in ff_hevc_sao_edge_filter_32_8_avx2, 32767 runs, 1 skips
>
> Width 64
> 705639 decicycles in sao_edge_filter_8, 262144 runs, 0 skips
> 19224 decicycles in ff_hevc_sao_edge_filter_64_8_ssse3, 262111 runs, 33 skips
> 10433 decicycles in ff_hevc_sao_edge_filter_64_8_avx2, 262115 runs, 29 skips

Are the first number for each case from before you split out the
restore part? Otherwise, that's gruesome.

> -    void (*sao_edge_filter)(uint8_t *_dst, uint8_t *_src, ptrdiff_t stride_dst,
> -                            ptrdiff_t stride_src, int16_t *sao_offset_val, int sao_eo_class,
> -                            int width, int height);
> +    void (*sao_edge_filter[5])(uint8_t *_dst, uint8_t *_src, ptrdiff_t stride_dst,
> +                               ptrdiff_t stride_src, int16_t *sao_offset_val, int sao_eo_class,
> +                               int width, int height);

Maybe add a comment on top of that to indicate that _dst is 16-byte-aligned?

Also, src and stride_src are so that the buffer is 32-byte-aligned, because of:
            stride_dst = 2*MAX_PB_SIZE + FF_INPUT_BUFFER_PADDING_SIZE;
            dst = lc->edge_emu_buffer + stride_dst +
FF_INPUT_BUFFER_PADDING_SIZE;
in hevc_filter.c, but I'm not sure how much it is a benefit here, or
often it is helping here. Don't hesitate to modify them if need be.

> +%else ; ARCH_X86_32
> +cglobal hevc_sao_edge_filter_%1_8, 1, 7, 8, dst, src, dststride, srcstride, a_stride, b_stride, height

As seen from above, srcstride is constant and is 2*MAX_PB_SIZE +
FF_INPUT_BUFFER_PADDING_SIZE.
That may save you one whole gpr. Not really useful here, but I think
you are more limited for the>8 bits case.
If you want to exploit this, also add it above void (*sao_edge_filter[5])

No comment on the actual assembly, it looks fine.

-- 
Christophe


More information about the ffmpeg-devel mailing list