[FFmpeg-devel] [PATCH v2 1/3] avfilter/x86/vf_exposure: add x86 SIMD optimization
James Almer
jamrial at gmail.com
Sat Nov 20 18:47:30 EET 2021
On 11/20/2021 1:46 PM, James Almer wrote:
> On 11/4/2021 1:18 AM, Wu Jianhua wrote:
>> diff --git a/libavfilter/x86/vf_exposure.asm
>> b/libavfilter/x86/vf_exposure.asm
>> new file mode 100644
>> index 0000000000..3351c6fb3b
>> --- /dev/null
>> +++ b/libavfilter/x86/vf_exposure.asm
>> @@ -0,0 +1,55 @@
>> +;*****************************************************************************
>>
>> +;* x86-optimized functions for exposure filter
>> +;*
>> +;* This file is part of FFmpeg.
>> +;*
>> +;* FFmpeg is free software; you can redistribute it and/or
>> +;* modify it under the terms of the GNU Lesser General Public
>> +;* License as published by the Free Software Foundation; either
>> +;* version 2.1 of the License, or (at your option) any later version.
>> +;*
>> +;* FFmpeg is distributed in the hope that it will be useful,
>> +;* but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +;* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
>> +;* Lesser General Public License for more details.
>> +;*
>> +;* You should have received a copy of the GNU Lesser General Public
>> +;* License along with FFmpeg; if not, write to the Free Software
>> +;* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
>> 02110-1301 USA
>> +;******************************************************************************
>>
>> +
>> +%include "libavutil/x86/x86util.asm"
>> +
>> +SECTION .text
>> +
>> +;*******************************************************************************
>>
>> +; void ff_exposure(float *ptr, int length, float black, float scale);
>> +;*******************************************************************************
>>
>> +%macro EXPOSURE 0
>> +cglobal exposure, 2, 2, 4, ptr, length, black, scale
>> + movsxdifnidn lengthq, lengthd
>> +%if WIN64
>> + VBROADCASTSS m0, xmm2
>> + VBROADCASTSS m1, xmm3
>> +%else
>> + VBROADCASTSS m0, xmm0
>> + VBROADCASTSS m1, xmm1
>> +%endif
>> +
>> +.loop:
>> + movu m2, [ptrq]
>> + subps m2, m2, m0
>> + mulps m2, m2, m1
>> + movu [ptrq], m2
>> + add ptrq, mmsize
>> + sub lengthq, mmsize/4
>> +
>> + jg .loop
>> +
>> + RET
>> +%endmacro
>> +
>> +%if ARCH_X86_64
>
> Why x86_64 only?
>
>> +INIT_XMM sse
>> +EXPOSURE
>
> Is it not possible to add an AVX version to process eight floats per
> loop? The function is already written in a way that you would only need
> to do
>
> %if HAVE_AVX_EXTERNAL
> INIT_YMM avx
> EXPOSURE
> %endif
>
> For it. And ptr alignment is not a problem seeing you're using unaligned
> movs.
Ignore this part. I need to remember to check entire patchsets before
starting to send replies...
More information about the ffmpeg-devel
mailing list