[FFmpeg-devel] [PATCH] avfilter/vf_maskedmerge: add SIMD for maskedmerge with 8 bit depth input
Paul B Mahol
onemda at gmail.com
Thu Oct 1 21:32:32 CEST 2015
On 10/1/15, Henrik Gramner <henrik at gramner.com> wrote:
> On Thu, Oct 1, 2015 at 8:42 PM, Paul B Mahol <onemda at gmail.com> wrote:
>> diff --git a/libavfilter/vf_maskedmerge.c b/libavfilter/vf_maskedmerge.c
>> if (desc->comp.depth == 8)
>> s->maskedmerge = maskedmerge8;
>> s->maskedmerge = maskedmerge16;
>> + if (ARCH_X86)
>> + ff_maskedmerge_init_x86(s);
> Create a new function ff_maskedmerge_init() and move the above code
> there, that will make it easier to add a unit test.
Maybe when me or someone else add test, now I'm just in learning asm stage.
>> diff --git a/libavfilter/x86/vf_maskedmerge.asm
>> + mova m5, [pw_128]
>> + mova m2, [pw_256]
>> + pxor m6, m6
> Nit: Reorganize your registers so you get those constants in m4, m5,
> m6. It will make the code easier to follow IMO.
>> + mov r10q, 0
> Xor a register with itself instead of using mov to zero a register.
> There's also no need to use the q suffix for plain register names, r10
> is enough.
>> + movh m0, [bsrcq + x]
>> + movh m1, [osrcq + x]
>> + movh m3, [msrcq + x]
>> + punpcklbw m0, m6
>> + punpcklbw m1, m6
>> + punpcklbw m3, m6
> You could also make an SSE4 version that uses pmovzxbw.
>> + paddw m1, m5
>> + psrlw m1, 8
> I believe you could also make an SSSE3 version that uses pmulhrsw
> instead of add + shift.
>> + add r10q, mmsize / 2
>> + cmp r10q, wq
>> + jl .loop
> There's a trick you could do here that might be faster:
> 1) Add w to bsrc, osrc, msrc and dst and then negate w in the
> beginning of the function.
> 2) Initialize r10 to w instead of 0 at the beginning of each .nextrow
> 3) You can now drop the cmp, the add will be enough to set the right
> flags for the branch
> I also encourage you to write a checkasm unit test, that will make it
> easier to both benchmark and verify the correctness of your code.
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
More information about the ffmpeg-devel