[FFmpeg-devel] [PATCH 6/6] x86/hevcdsp: add ff_hevc_sao_edge_filter_{10, 12}_{sse2, avx2}
Mickaël Raulet
mraulet at insa-rennes.fr
Wed Feb 4 21:05:00 CET 2015
LGTM.
Mickael
2015-02-04 13:51 GMT+01:00 Christophe Gisquet <christophe.gisquet at gmail.com>
:
> Hi,
>
> 2015-02-04 4:55 GMT+01:00 James Almer <jamrial at gmail.com>:
>
> > -DECLARE_ALIGNED(16, const xmm_reg, ff_pw_1) = {
> 0x0001000100010001ULL, 0x0001000100010001ULL };
> > -DECLARE_ALIGNED(16, const xmm_reg, ff_pw_2) = {
> 0x0002000200020002ULL, 0x0002000200020002ULL };
> > +DECLARE_ALIGNED(32, const ymm_reg, ff_pw_1) = {
> 0x0001000100010001ULL, 0x0001000100010001ULL,
> > +
> 0x0001000100010001ULL, 0x0001000100010001ULL };
> > +DECLARE_ALIGNED(32, const ymm_reg, ff_pw_2) = {
> 0x0002000200020002ULL, 0x0002000200020002ULL,
> > +
> 0x0002000200020002ULL, 0x0002000200020002ULL };
> > DECLARE_ALIGNED(16, const xmm_reg, ff_pw_3) = {
> 0x0003000300030003ULL, 0x0003000300030003ULL };
> > DECLARE_ALIGNED(16, const xmm_reg, ff_pw_4) = {
> 0x0004000400040004ULL, 0x0004000400040004ULL };
> > DECLARE_ALIGNED(16, const xmm_reg, ff_pw_5) = {
> 0x0005000500050005ULL, 0x0005000500050005ULL };
> > @@ -48,7 +50,8 @@ DECLARE_ALIGNED(16, const xmm_reg, ff_pw_1019) = {
> 0x03FB03FB03FB03FBULL, 0x03F
> > DECLARE_ALIGNED(16, const xmm_reg, ff_pw_1024) = {
> 0x0400040004000400ULL, 0x0400040004000400ULL };
> > DECLARE_ALIGNED(16, const xmm_reg, ff_pw_2048) = {
> 0x0800080008000800ULL, 0x0800080008000800ULL };
> > DECLARE_ALIGNED(16, const xmm_reg, ff_pw_8192) = {
> 0x2000200020002000ULL, 0x2000200020002000ULL };
> > -DECLARE_ALIGNED(16, const xmm_reg, ff_pw_m1) = {
> 0xFFFFFFFFFFFFFFFFULL, 0xFFFFFFFFFFFFFFFFULL };
> > +DECLARE_ALIGNED(32, const ymm_reg, ff_pw_m1) = {
> 0xFFFFFFFFFFFFFFFFULL, 0xFFFFFFFFFFFFFFFFULL,
> > +
> 0xFFFFFFFFFFFFFFFFULL, 0xFFFFFFFFFFFFFFFFULL };
>
> Nice of you to do this. There is more sharing to do, but I have
> patches waiting for your patchset and the avx2 patch to clean even
> more.
>
> > +;void ff_hevc_sao_edge_filter_<width>_<depth>_<opt>(uint8_t *_dst,
> uint8_t *_src, ptrdiff_t stride_dst, ptrdiff_t stride_src,
> > +; int16_t
> *sao_offset_val, int eo, int width, int height);
> > +%macro HEVC_SAO_EDGE_FILTER_16 3
> > +%if WIN64
> > +cglobal hevc_sao_edge_filter_%2_%1, 4, 8, 16, dst, src, dststride,
> srcstride, eo, a_stride, b_stride, height
>
> Ok, nevermind my comment in patch 5/6: 16 xmm regs are too much for
> x86_32. Or playing with the stack is required, but that would be
> another patch, if ever.
>
> Otherwise, nothing striking in that code, looks good.
>
> Thanks,
> --
> Christophe
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
More information about the ffmpeg-devel
mailing list