[FFmpeg-devel] [PATCH v2 1/3] avfilter/x86/vf_exposure: add x86 SIMD optimization

Sat Nov 20 18:47:30 EET 2021

On 11/20/2021 1:46 PM, James Almer wrote:
> On 11/4/2021 1:18 AM, Wu Jianhua wrote:
>> diff --git a/libavfilter/x86/vf_exposure.asm 
>> b/libavfilter/x86/vf_exposure.asm
>> new file mode 100644
>> index 0000000000..3351c6fb3b
>> --- /dev/null
>> +++ b/libavfilter/x86/vf_exposure.asm
>> @@ -0,0 +1,55 @@
>> +;***************************************************************************** 
>>
>> +;* x86-optimized functions for exposure filter
>> +;*
>> +;* This file is part of FFmpeg.
>> +;*
>> +;* FFmpeg is free software; you can redistribute it and/or
>> +;* modify it under the terms of the GNU Lesser General Public
>> +;* License as published by the Free Software Foundation; either
>> +;* version 2.1 of the License, or (at your option) any later version.
>> +;*
>> +;* FFmpeg is distributed in the hope that it will be useful,
>> +;* but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +;* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>> +;* Lesser General Public License for more details.
>> +;*
>> +;* You should have received a copy of the GNU Lesser General Public
>> +;* License along with FFmpeg; if not, write to the Free Software
>> +;* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 
>> 02110-1301 USA
>> +;****************************************************************************** 
>>
>> +
>> +%include "libavutil/x86/x86util.asm"
>> +
>> +SECTION .text
>> +
>> +;******************************************************************************* 
>>
>> +; void ff_exposure(float *ptr, int length, float black, float scale);
>> +;******************************************************************************* 
>>
>> +%macro EXPOSURE 0
>> +cglobal exposure, 2, 2, 4, ptr, length, black, scale
>> +    movsxdifnidn lengthq, lengthd
>> +%if WIN64
>> +    VBROADCASTSS m0, xmm2
>> +    VBROADCASTSS m1, xmm3
>> +%else
>> +    VBROADCASTSS m0, xmm0
>> +    VBROADCASTSS m1, xmm1
>> +%endif
>> +
>> +.loop:
>> +    movu        m2, [ptrq]
>> +    subps       m2, m2, m0
>> +    mulps       m2, m2, m1
>> +    movu    [ptrq], m2
>> +    add       ptrq, mmsize
>> +    sub    lengthq, mmsize/4
>> +
>> +    jg .loop
>> +
>> +    RET
>> +%endmacro
>> +
>> +%if ARCH_X86_64
> 
> Why x86_64 only?
> 
>> +INIT_XMM sse
>> +EXPOSURE
> 
> Is it not possible to add an AVX version to process eight floats per 
> loop? The function is already written in a way that you would only need 
> to do
> 
> %if HAVE_AVX_EXTERNAL
> INIT_YMM avx
> EXPOSURE
> %endif
> 
> For it. And ptr alignment is not a problem seeing you're using unaligned 
> movs.

Ignore this part. I need to remember to check entire patchsets before 
starting to send replies...