[FFmpeg-devel] [PATCH] avcodec/cfhd: add x86 SIMD

James Almer jamrial at gmail.com
Thu Aug 20 17:04:47 EEST 2020


On 8/20/2020 7:53 AM, Moritz Barsnick wrote:
> On Sun, Aug 16, 2020 at 18:25:12 +0200, Paul B Mahol wrote:
>> On 8/16/20, Paul B Mahol <onemda at gmail.com> wrote:
>>> Please help porting this to linux and 64bit calling convention.
>>
>> New patch attached.
>>
>> This one does not allocate stack on x32.
> 
> I wanted to benchmark on several machines (newest I have is a Haswell,
> I also have an "Intel(R) Atom(TM) CPU D525   @ 1.80GHz" x86_64, and the
> below is a Pentium 4 x86), but got stuck on the ancient x86.
> 
> Firstly, superficial benchmark result on the Pentium 4:
> $ time ffmpeg -i bigger_res.mov -map 0:v -f null -
> 
> Without patchset: speed=0.0331x (plus/minus a bit)
> With    patchset: speed=0.0577x (plus/minus a bit)
> 
> I'll add benchmarks with my other systems, if desired.
> 
> Alas, with the patchset, the following command quickly terminates with
> Illegal instruction in ff_cfhd_horiz_filter_clip10_sse2 ():
> 
> $ ffmpeg -i MT_BeartoothHighway_1min_Cineform.avi -map 0:v -f null -
> 
> (and obviously doesn't terminate with "-cpuflags 0", or without the
> patchset).
> 
> See assembler dump below.
> Compilier: icc (ICC) 14.0.3 20140422
> Assembler: nasm-2.13.02
> 
> Assembly dump from gdb:
> Dump of assembler code from 0x919572f to 0x919576f:
>    0x0919572f <ff_cfhd_horiz_filter_clip10_sse2+47>:    movl   $0xbf0f03ff,(%ecx,%eax,8)
>    0x09195736 <ff_cfhd_horiz_filter_clip10_sse2+54>:    xor    (%ecx),%al
>    0x09195738 <ff_cfhd_horiz_filter_clip10_sse2+56>:    not    %ecx
>    0x0919573a <ff_cfhd_horiz_filter_clip10_sse2+58>:    jmp    *0xf(%esi)
>    0x0919573d <ff_cfhd_horiz_filter_clip10_sse2+61>:    outsb  %ds:(%esi),(%dx)
>    0x0919573e <ff_cfhd_horiz_filter_clip10_sse2+62>:    (bad)
>    0x0919573f <ff_cfhd_horiz_filter_clip10_sse2+63>:    pmaxsw 0x99e6da0,%xmm0
>    0x09195747 <ff_cfhd_horiz_filter_clip10_sse2+71>:    pminsw 0x99e6db0,%xmm0
> => 0x0919574f <ff_cfhd_horiz_filter_clip10_sse2+79>:    pextrw $0x0,%xmm0,(%eax)

Yeah, pextrw with a memory address as dest argument is sse4.1. sse2 is
only for gpr as dest argument.


More information about the ffmpeg-devel mailing list