[FFmpeg-devel] [PATCH] avcodec/cfhd: add x86 SIMD

Moritz Barsnick barsnick at gmx.net
Thu Aug 20 13:53:05 EEST 2020


On Sun, Aug 16, 2020 at 18:25:12 +0200, Paul B Mahol wrote:
> On 8/16/20, Paul B Mahol <onemda at gmail.com> wrote:
> > Please help porting this to linux and 64bit calling convention.
>
> New patch attached.
>
> This one does not allocate stack on x32.

I wanted to benchmark on several machines (newest I have is a Haswell,
I also have an "Intel(R) Atom(TM) CPU D525   @ 1.80GHz" x86_64, and the
below is a Pentium 4 x86), but got stuck on the ancient x86.

Firstly, superficial benchmark result on the Pentium 4:
$ time ffmpeg -i bigger_res.mov -map 0:v -f null -

Without patchset: speed=0.0331x (plus/minus a bit)
With    patchset: speed=0.0577x (plus/minus a bit)

I'll add benchmarks with my other systems, if desired.

Alas, with the patchset, the following command quickly terminates with
Illegal instruction in ff_cfhd_horiz_filter_clip10_sse2 ():

$ ffmpeg -i MT_BeartoothHighway_1min_Cineform.avi -map 0:v -f null -

(and obviously doesn't terminate with "-cpuflags 0", or without the
patchset).

See assembler dump below.
Compilier: icc (ICC) 14.0.3 20140422
Assembler: nasm-2.13.02

Assembly dump from gdb:
Dump of assembler code from 0x919572f to 0x919576f:
   0x0919572f <ff_cfhd_horiz_filter_clip10_sse2+47>:    movl   $0xbf0f03ff,(%ecx,%eax,8)
   0x09195736 <ff_cfhd_horiz_filter_clip10_sse2+54>:    xor    (%ecx),%al
   0x09195738 <ff_cfhd_horiz_filter_clip10_sse2+56>:    not    %ecx
   0x0919573a <ff_cfhd_horiz_filter_clip10_sse2+58>:    jmp    *0xf(%esi)
   0x0919573d <ff_cfhd_horiz_filter_clip10_sse2+61>:    outsb  %ds:(%esi),(%dx)
   0x0919573e <ff_cfhd_horiz_filter_clip10_sse2+62>:    (bad)
   0x0919573f <ff_cfhd_horiz_filter_clip10_sse2+63>:    pmaxsw 0x99e6da0,%xmm0
   0x09195747 <ff_cfhd_horiz_filter_clip10_sse2+71>:    pminsw 0x99e6db0,%xmm0
=> 0x0919574f <ff_cfhd_horiz_filter_clip10_sse2+79>:    pextrw $0x0,%xmm0,(%eax)
   0x09195755 <ff_cfhd_horiz_filter_clip10_sse2+85>:    movswl (%ecx),%esi
   0x09195758 <ff_cfhd_horiz_filter_clip10_sse2+88>:    imul   $0x5,%esi,%esi
   0x0919575b <ff_cfhd_horiz_filter_clip10_sse2+91>:    movswl 0x2(%ecx),%edi
   0x0919575f <ff_cfhd_horiz_filter_clip10_sse2+95>:    imul   $0x4,%edi,%edi
   0x09195762 <ff_cfhd_horiz_filter_clip10_sse2+98>:    add    %esi,%edi
   0x09195764 <ff_cfhd_horiz_filter_clip10_sse2+100>:   movswl 0x4(%ecx),%esi
   0x09195768 <ff_cfhd_horiz_filter_clip10_sse2+104>:   sub    %esi,%edi
   0x0919576a <ff_cfhd_horiz_filter_clip10_sse2+106>:   add    $0x4,%edi
   0x0919576d <ff_cfhd_horiz_filter_clip10_sse2+109>:   sar    $0x3,%edi
End of assembler dump.

CPU info:
barsnick at sunshine:~ > hwinfo --cpu
01: None 00.0: 10103 CPU
  [Created at cpu.457]
  Unique ID: rdCR.j8NaKXDZtZ6
  Hardware Class: cpu
  Arch: Intel
  Vendor: "GenuineIntel"
  Model: 15.2.9 "Intel(R) Pentium(R) 4 CPU 2.80GHz"
  Features: fpu,vme,de,pse,tsc,msr,pae,mce,cx8,apic,mtrr,pge,mca,cmov,pat,pse36,clflush,dts,acpi,mmx,fxsr,sse,sse2,ss,ht,tm,pbe,pebs,bts,cid,xtpr
  Clock: 2800 MHz
  BogoMips: 5597.27
  Cache: 512 kb
  Units/Processor: 2
  Config Status: cfg=new, avail=yes, need=no, active=unknown

02: None 01.0: 10103 CPU
  [Created at cpu.457]
  Unique ID: wkFv.j8NaKXDZtZ6
  Hardware Class: cpu
  Arch: Intel
  Vendor: "GenuineIntel"
  Model: 15.2.9 "Intel(R) Pentium(R) 4 CPU 2.80GHz"
  Features: fpu,vme,de,pse,tsc,msr,pae,mce,cx8,apic,mtrr,pge,mca,cmov,pat,pse36,clflush,dts,acpi,mmx,fxsr,sse,sse2,ss,ht,tm,pbe,pebs,bts,cid,xtpr
  Clock: 2800 MHz
  BogoMips: 27198.67
  Cache: 512 kb
  Units/Processor: 2
  Config Status: cfg=new, avail=yes, need=no, active=unknown

Cheers,
Moritz


More information about the ffmpeg-devel mailing list