[FFmpeg-devel] Mixed data type in SIMD code?
Zuxy Meng
zuxy.meng
Wed Mar 5 06:08:37 CET 2008
Hi,
2008/3/5, Loren Merritt <lorenm at u.washington.edu>:
> On Tue, 4 Mar 2008, Michael Niedermayer wrote:
> > On Mon, Mar 03, 2008 at 04:30:08PM -0700, Loren Merritt wrote:
> >> On Mon, 3 Mar 2008, Michael Niedermayer wrote:
> >>>
> >>> Also i doubt we use or ever will use packed double.
> >>
> >> flac encoder does. Single isn't precise enough for a linear sum of up
> >> to 16k elements. Reordering the sum to a tree made it half-way
> >> decent decent precision, but also made it as slow as double.
> >
> > What about something like:
> >
> > for(i=0; i<16000;){
> > float sum=0;
> > do{
> > sum+= whatever[i++];
> > }while(i&127);
> > double_sum += sum;
> > }
>
> done.
>
> core2:
> 2039632 dezicycles in autocorr_double_c, 65536 runs, 0 skips
> 771026 dezicycles in autocorr_double_sse2, 65536 runs, 0 skips
> 524713 dezicycles in autocorr_float_sse1, 65536 runs, 0 skips
> 500609 dezicycles in autocorr_float_sse2, 65534 runs, 2 skips
> 432458 dezicycles in autocorr_float_ssse3, 65535 runs, 1 skips
> overall: 4.8%
>
> k8:
> 1776170 dezicycles in autocorr_double_c, 65534 runs, 2 skips
> 1062022 dezicycles in autocorr_double_sse2, 65535 runs, 1 skips
> 932452 dezicycles in autocorr_float_sse1, 65533 runs, 3 skips
> 911259 dezicycles in autocorr_float_sse2, 65534 runs, 2 skips
> overall: 2.5%
>
It looks to me that
+ OP2(movhlps, 6,0, 7,1)\
+ OP2(addsd, 6,0, 7,1)\
+ "movsd %%xmm0, %2 \n\t"\
+ "movsd %%xmm1, 8+%2 \n\t"\
can be optimized to
haddpd %%xmm7, %%xmm6\n\t
movapd %%xmm6, %2\n\t
when SSE3 is available.
--
Zuxy
Beauty is truth,
While truth is beauty.
PGP KeyID: E8555ED6
More information about the ffmpeg-devel
mailing list