[FFmpeg-devel] [PATCH 5/6] x86/hevcdsp: add ff_hevc_sao_edge_filter_8_{ssse3, avx2}
Christophe Gisquet
christophe.gisquet at gmail.com
Wed Feb 4 13:39:21 CET 2015
Hi,
2015-02-04 4:55 GMT+01:00 James Almer <jamrial at gmail.com>:
> Original x86 intrinsics code and initial yasm port by Pierre-Edouard Lepere.
> Refactoring and optimizations by James Almer.
Add your own copyright to this file then.
> Width 32
> 158583 decicycles in edge, sao_edge_filter_8 runs, 0 skips
> 5205 decicycles in ff_hevc_sao_edge_filter_32_8_ssse3, 32767 runs, 1 skips
> 2942 decicycles in ff_hevc_sao_edge_filter_32_8_avx2, 32767 runs, 1 skips
>
> Width 64
> 705639 decicycles in sao_edge_filter_8, 262144 runs, 0 skips
> 19224 decicycles in ff_hevc_sao_edge_filter_64_8_ssse3, 262111 runs, 33 skips
> 10433 decicycles in ff_hevc_sao_edge_filter_64_8_avx2, 262115 runs, 29 skips
Are the first number for each case from before you split out the
restore part? Otherwise, that's gruesome.
> - void (*sao_edge_filter)(uint8_t *_dst, uint8_t *_src, ptrdiff_t stride_dst,
> - ptrdiff_t stride_src, int16_t *sao_offset_val, int sao_eo_class,
> - int width, int height);
> + void (*sao_edge_filter[5])(uint8_t *_dst, uint8_t *_src, ptrdiff_t stride_dst,
> + ptrdiff_t stride_src, int16_t *sao_offset_val, int sao_eo_class,
> + int width, int height);
Maybe add a comment on top of that to indicate that _dst is 16-byte-aligned?
Also, src and stride_src are so that the buffer is 32-byte-aligned, because of:
stride_dst = 2*MAX_PB_SIZE + FF_INPUT_BUFFER_PADDING_SIZE;
dst = lc->edge_emu_buffer + stride_dst +
FF_INPUT_BUFFER_PADDING_SIZE;
in hevc_filter.c, but I'm not sure how much it is a benefit here, or
often it is helping here. Don't hesitate to modify them if need be.
> +%else ; ARCH_X86_32
> +cglobal hevc_sao_edge_filter_%1_8, 1, 7, 8, dst, src, dststride, srcstride, a_stride, b_stride, height
As seen from above, srcstride is constant and is 2*MAX_PB_SIZE +
FF_INPUT_BUFFER_PADDING_SIZE.
That may save you one whole gpr. Not really useful here, but I think
you are more limited for the>8 bits case.
If you want to exploit this, also add it above void (*sao_edge_filter[5])
No comment on the actual assembly, it looks fine.
--
Christophe
More information about the ffmpeg-devel
mailing list