[FFmpeg-devel] [PATCH 5/6] x86/hevcdsp: add ff_hevc_sao_edge_filter_8_{ssse3, avx2}
Mickaël Raulet
mraulet at insa-rennes.fr
Wed Feb 4 21:04:21 CET 2015
LGTM
Mickael
2015-02-04 13:39 GMT+01:00 Christophe Gisquet <christophe.gisquet at gmail.com>
:
> Hi,
>
> 2015-02-04 4:55 GMT+01:00 James Almer <jamrial at gmail.com>:
> > Original x86 intrinsics code and initial yasm port by Pierre-Edouard
> Lepere.
> > Refactoring and optimizations by James Almer.
>
> Add your own copyright to this file then.
>
> > Width 32
> > 158583 decicycles in edge, sao_edge_filter_8 runs, 0 skips
> > 5205 decicycles in ff_hevc_sao_edge_filter_32_8_ssse3, 32767 runs, 1
> skips
> > 2942 decicycles in ff_hevc_sao_edge_filter_32_8_avx2, 32767 runs, 1 skips
> >
> > Width 64
> > 705639 decicycles in sao_edge_filter_8, 262144 runs, 0 skips
> > 19224 decicycles in ff_hevc_sao_edge_filter_64_8_ssse3, 262111 runs, 33
> skips
> > 10433 decicycles in ff_hevc_sao_edge_filter_64_8_avx2, 262115 runs, 29
> skips
>
> Are the first number for each case from before you split out the
> restore part? Otherwise, that's gruesome.
>
> > - void (*sao_edge_filter)(uint8_t *_dst, uint8_t *_src, ptrdiff_t
> stride_dst,
> > - ptrdiff_t stride_src, int16_t
> *sao_offset_val, int sao_eo_class,
> > - int width, int height);
> > + void (*sao_edge_filter[5])(uint8_t *_dst, uint8_t *_src, ptrdiff_t
> stride_dst,
> > + ptrdiff_t stride_src, int16_t
> *sao_offset_val, int sao_eo_class,
> > + int width, int height);
>
> Maybe add a comment on top of that to indicate that _dst is
> 16-byte-aligned?
>
> Also, src and stride_src are so that the buffer is 32-byte-aligned,
> because of:
> stride_dst = 2*MAX_PB_SIZE + FF_INPUT_BUFFER_PADDING_SIZE;
> dst = lc->edge_emu_buffer + stride_dst +
> FF_INPUT_BUFFER_PADDING_SIZE;
> in hevc_filter.c, but I'm not sure how much it is a benefit here, or
> often it is helping here. Don't hesitate to modify them if need be.
>
> > +%else ; ARCH_X86_32
> > +cglobal hevc_sao_edge_filter_%1_8, 1, 7, 8, dst, src, dststride,
> srcstride, a_stride, b_stride, height
>
> As seen from above, srcstride is constant and is 2*MAX_PB_SIZE +
> FF_INPUT_BUFFER_PADDING_SIZE.
> That may save you one whole gpr. Not really useful here, but I think
> you are more limited for the>8 bits case.
> If you want to exploit this, also add it above void (*sao_edge_filter[5])
>
> No comment on the actual assembly, it looks fine.
>
> --
> Christophe
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
More information about the ffmpeg-devel
mailing list