[FFmpeg-devel] [PATCH] x86/hevc_sao: make sao_band_filter work on x86_32
James Almer
jamrial at gmail.com
Sun Feb 8 18:48:08 CET 2015
On 08/02/15 8:21 AM, Christophe Gisquet wrote:
> Hi,
>
> 2015-02-07 23:06 GMT+01:00 James Almer <jamrial at gmail.com>:
>> Signed-off-by: James Almer <jamrial at gmail.com>
>> ---
>> libavcodec/x86/hevc_sao.asm | 40 ++++++++++++++++++++++++++++++++++++----
>> libavcodec/x86/hevcdsp_init.c | 24 ++++++++++++------------
>> 2 files changed, 48 insertions(+), 16 deletions(-)
>
> Passes fate here for Win32/64, but that's probably not the platforms
> you interested in hearing feedback from.
> Looks OK to commit.
>
> A few questions though:
>
>> + %assign MMSIZE mmsize
>
> Why do that? Not a big deal: it's only for my education, if there's
> something I'm missing.
For width 48, the COMPUTE macro is last run after an INIT_XMM cpuname, so mmsize becomes
16 and in the avx2 version the instructions would access the wrong data in stack.
Doing %assign MMSIZE mmsize at the beginning of the function and using it here makes sure
it's always 32 in avx2.
sse2 is unaffected by this, of course.
And the reason I'm using INIT_XMM in the middle of the function for the avx2 width 48 case
is because i couldn't find a nice and clean way to use the xm* reg aliases with the COMPUTE
macros.
>
>> + pcmpeqw m4, %2, [rsp+MMSIZE*0]
>> + pcmpeqw m5, %2, [rsp+MMSIZE*1]
>> + pcmpeqw m6, %2, [rsp+MMSIZE*2]
>> + pcmpeqw %2, [rsp+MMSIZE*3]
>> + pand m4, [rsp+MMSIZE*4]
>> + pand m5, [rsp+MMSIZE*5]
>> + pand m6, [rsp+MMSIZE*6]
> [...]
>> -cglobal hevc_sao_band_filter_%1_8, 6, 6, 15, dst, src, dststride, srcstride, offset, left
>> +cglobal hevc_sao_band_filter_%1_8, 6, 6, 15, 8*mmsize*ARCH_X86_32, dst, src, dststride, srcstride, offset, left
>> HEVC_SAO_BAND_FILTER_INIT 8
>
> Why do you need room for 8 regs, and not 7?
Remnant from before i realized i could keep m7 untouched. I'll change it.
>
> Setting this to 7 and doing the required changes to remove MMSIZE pass
> here on Win32.
Again, MMSIZE is for avx2 and you're probably running the sse2 version, where it's always
the same as mmsize.
>
>> - SAO_BAND_INIT(10, sse2);
>> SAO_EDGE_INIT(10, sse2);
>> }
>> + SAO_BAND_INIT(10, sse2);
>
> On a side note, you haven't ported >8 bits edge filter to x86_32. I
> guess because it wouldn't play real time anyway?
I was going to give edge >8bit a try next.
More information about the ffmpeg-devel
mailing list