[FFmpeg-devel] Mixed data type in SIMD code?

Zuxy Meng zuxy.meng
Wed Mar 5 06:08:37 CET 2008


Hi,

2008/3/5, Loren Merritt <lorenm at u.washington.edu>:
> On Tue, 4 Mar 2008, Michael Niedermayer wrote:
> > On Mon, Mar 03, 2008 at 04:30:08PM -0700, Loren Merritt wrote:
> >> On Mon, 3 Mar 2008, Michael Niedermayer wrote:
> >>>
> >>> Also i doubt we use or ever will use packed double.
> >>
> >> flac encoder does. Single isn't precise enough for a linear sum of up
> >> to 16k elements. Reordering the sum to a tree made it half-way
> >> decent decent precision, but also made it as slow as double.
> >
> > What about something like:
> >
> > for(i=0; i<16000;){
> >    float sum=0;
> >    do{
> >        sum+= whatever[i++];
> >    }while(i&127);
> >    double_sum += sum;
> > }
>
> done.
>
> core2:
> 2039632 dezicycles in autocorr_double_c, 65536 runs, 0 skips
> 771026 dezicycles in autocorr_double_sse2, 65536 runs, 0 skips
> 524713 dezicycles in autocorr_float_sse1, 65536 runs, 0 skips
> 500609 dezicycles in autocorr_float_sse2, 65534 runs, 2 skips
> 432458 dezicycles in autocorr_float_ssse3, 65535 runs, 1 skips
> overall: 4.8%
>
> k8:
> 1776170 dezicycles in autocorr_double_c, 65534 runs, 2 skips
> 1062022 dezicycles in autocorr_double_sse2, 65535 runs, 1 skips
> 932452 dezicycles in autocorr_float_sse1, 65533 runs, 3 skips
> 911259 dezicycles in autocorr_float_sse2, 65534 runs, 2 skips
> overall: 2.5%
>

It looks to me that

+        OP2(movhlps,  6,0, 7,1)\
+        OP2(addsd,    6,0, 7,1)\
+        "movsd   %%xmm0,    %2  \n\t"\
+        "movsd   %%xmm1,  8+%2  \n\t"\

can be optimized to

          haddpd %%xmm7, %%xmm6\n\t
          movapd %%xmm6, %2\n\t

when SSE3 is available.

-- 
Zuxy
Beauty is truth,
While truth is beauty.
PGP KeyID: E8555ED6




More information about the ffmpeg-devel mailing list