[FFmpeg-devel] [PATCH] SSE3/4 implementation of flac_encode_residual_lpc

Sun Jun 21 15:47:13 CEST 2009

On Thu, 18 Jun 2009 13:51:00 +0200
Michael Niedermayer <michaelni at gmx.at> wrote:

> On Sat, May 30, 2009 at 09:30:28PM +0000, Loren Merritt wrote:
> > On Sat, 30 May 2009, Bobby Bingham wrote:
> >> On Fri, 29 May 2009, Loren Merritt wrote:
> >>
> >>> For the remainder, this logic should be doable
> >>> with just 1 paddd and 1 por per vector. Merge several vectors
> >>> before branching.
> >>
> >> I'm afraid I don't quite see what you mean by using 1 paddd and 1
> >> por. The attached patch does have a slight improvement in this
> >> piece of code, but I doubt it's what you meant.
> >
> > The C version is:
> > (unsigned)(x+0x8000) >= 0x10000
> > And to merge several entries before the branch:
> > (unsigned)((x[0]+0x8000) | (x[1]+0x8000) | ...) >= 0x10000
> > Or since sse doesn't have an uint32 compare:
> > (((x[0]+0x8000) | (x[1]+0x8000) | ...) >> 16) != 0
> >
> > This won't be much if any faster than yours when testing one vector
> > at a time.
> 
> 
> whats the status of this patch?
> waiting for changes?
> ok to commit?
> want me to review it?
> 

I haven't had much time to work on it lately.  I want to try a couple
variations on Loren's idea and compare them before I submit it for
review.

-- 
Bobby Bingham
??????????????????????