[FFmpeg-devel] [PATCH] swscale_unscaled: fix and speed up DITHER_COPY macro for x86 with SSE2

James Almer jamrial at gmail.com
Fri Sep 22 20:28:35 EEST 2017


On 9/22/2017 2:06 PM, Mateusz wrote:
> W dniu 2017-09-22 o 17:47, James Almer pisze:
>> On 9/22/2017 12:23 PM, Mateusz wrote:
>>> New version of the patch -- now it uses the same logic independent of the target bitdepth.
>>>
>>> For x86_64 it is much faster than current code (with perfect quality), for x86_32 it is fast
>>> if you add to configure: --extra-cflags="-msse2"
>>> (for x86_32 with default configure options it is slower than current code but with better quality)
>>>
>>> Please review/test.
>>>
>>> Mateusz
>>
>> We don't accept intrinsics, or new arch specific code outside of arch
>> specific folders.
>>
>> Either write this in NASM syntax, or if it *really* needs to be inlined,
>> use __asm__() inline blocks. But whichever you use, it needs to go in
>> the x86/ folder.
> 
> Thank you for the information! I'm starting learning NASM syntax (it could last for months).

https://blogs.gnome.org/rbultje/2017/07/14/writing-x86-simd-using-x86inc-asm/

Give that a read. It's a tutorial for handwritten ASM written in NASM
syntax using the x86inc.asm helper we use in our codebase. It simplifies
the work considerably.
Of course, you can also take a look at existing asm functions in the
project.


More information about the ffmpeg-devel mailing list