[FFmpeg-devel] [PATCH] avfilter/vf_maskedmerge: add SIMD for maskedmerge with 8 bit depth input

Paul B Mahol onemda at gmail.com
Thu Oct 1 21:32:32 CEST 2015


On 10/1/15, Henrik Gramner <henrik at gramner.com> wrote:
> On Thu, Oct 1, 2015 at 8:42 PM, Paul B Mahol <onemda at gmail.com> wrote:
>> diff --git a/libavfilter/vf_maskedmerge.c b/libavfilter/vf_maskedmerge.c
>
>>      if (desc->comp[0].depth == 8)
>>          s->maskedmerge = maskedmerge8;
>>      else
>>          s->maskedmerge = maskedmerge16;
>>
>> +    if (ARCH_X86)
>> +        ff_maskedmerge_init_x86(s);
>> +
>
> Create a new function ff_maskedmerge_init() and move the above code
> there, that will make it easier to add a unit test.

Maybe when me or someone else add test, now I'm just in learning asm stage.

>
>> diff --git a/libavfilter/x86/vf_maskedmerge.asm
>> b/libavfilter/x86/vf_maskedmerge.asm
>
>> +    mova m5, [pw_128]
>> +    mova m2, [pw_256]
>> +    pxor m6, m6
>
> Nit: Reorganize your registers so you get those constants in m4, m5,
> m6. It will make the code easier to follow IMO.

Changed locally.

>
>> +    mov r10q, 0
>
> Xor a register with itself instead of using mov to zero a register.
> There's also no need to use the q suffix for plain register names, r10
> is enough.

Changed locally.

>
>> +        movh m0, [bsrcq + x]
>> +        movh m1, [osrcq + x]
>> +        movh m3, [msrcq + x]
> [...]
>> +        punpcklbw m0, m6
>> +        punpcklbw m1, m6
>> +        punpcklbw m3, m6
>
> You could also make an SSE4 version that uses pmovzxbw.
>
>> +        paddw m1, m5
>> +        psrlw m1, 8
>
> I believe you could also make an SSSE3 version that uses pmulhrsw
> instead of add + shift.
>
>> +        add r10q, mmsize / 2
>> +        cmp r10q, wq
>> +    jl .loop
>
> There's a trick you could do here that might be faster:
> 1) Add w to bsrc, osrc, msrc and dst and then negate w in the
> beginning of the function.
> 2) Initialize r10 to w instead of 0 at the beginning of each .nextrow
> iteration
> 3) You can now drop the cmp, the add will be enough to set the right
> flags for the branch

Will experiment.

>
> I also encourage you to write a checkasm unit test, that will make it
> easier to both benchmark and verify the correctness of your code.

Maybe later.

> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>


More information about the ffmpeg-devel mailing list