[FFmpeg-devel] [RFC] SSE3/4 implementation of flac_encode_residual_lpc

Mon May 4 05:39:05 CEST 2009

On Sat, 25 Apr 2009 03:03:30 +0000 (UTC)
Loren Merritt <lorenm at u.washington.edu> wrote:

> On Fri, 24 Apr 2009, Bobby Bingham wrote:
> 
> > Attached are patches to move flac_encode_residual_lpc to dsputils,
> > and to add SSE3 and SSE4 implementations.  I wrote the SSE3 first,
> > but since it doesn't have signed 32x32 multiplication AFAICT, I
> > ended up using double precision floats for it, and the result is
> > code that's slower than the C version.  Unless somebody has a
> > suggestion of how to fix this, I'll drop the SSE3 version.
> >
> > I tried an SSE4 version because it does have signed 32x32->32
> > multiplication, like the C version uses.  Unfortunately, I don't
> > have an SSE4-capable processor to test it with, so I can't check
> > its speed or even its correctness.  Benchmarks welcome.
> 
> fails regression test on my Penryn.
> 
> > +// TODO: look into palignr?
> 
> Yea, do that. It should be possible to load each sample just once 
> (aligned), and do all other manipulation in registers.
> There are no cpus with both lddqu and sse4, so you're paying the full 
> cost of unaligned loads.

I've changed the code to use palignr, and hopefully fixed it to work
correctly now.  I've also removed the SSE3 code from this patch as I
haven't managed to get it any faster by using integer arithmetic yet.

-- 
Bobby Bingham
??????????????????????
-------------- next part --------------
A non-text attachment was scrubbed...
Name: flac_sse.patch
Type: text/x-patch
Size: 7993 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20090503/582a994f/attachment.bin>