[FFmpeg-devel] [PATCH] SSE3/4 implementation of flac_encode_residual_lpc

Bobby Bingham uhmmmm
Sat May 30 18:40:01 CEST 2009

On Fri, 29 May 2009 17:00:12 +0000 (UTC)
Loren Merritt <lorenm at u.washington.edu> wrote:

> On Thu, 28 May 2009, Bobby Bingham wrote:
> > Attached is a version I hope is about ready for inclusion.
> > Provides an overall encoding speedup of ~30% at
> > compression_level=12.
> > "movdqa     %%xmm3,  %%xmm6 \n\t" // verify that 16 bits is enough
> > "movdqa     %%xmm5,  %%xmm7 \n\t"
> > "pslld      $16,     %%xmm6 \n\t"
> > "pslld      $16,     %%xmm7 \n\t"
> > "psrad      $16,     %%xmm6 \n\t"
> > "psrad      $16,     %%xmm7 \n\t"
> > "pcmpeqd    %%xmm3,  %%xmm6 \n\t"
> > "pcmpeqd    %%xmm5,  %%xmm7 \n\t"
> > "pand       %%xmm6,  %%xmm7 \n\t"
> > "pmovmskb   %%xmm7,  %2     \n\t"
> > "cmp        $0xffff, %2     \n\t"
> > "jne        2f              \n\t"
> About half of the invocations to flac_encode_residual_lpc will know
> in advance that all of the samples fit in 16bit, so those shouldn't
> check this at all.

I've made this change in the attached patch.  But in my testing, any
speed difference is so small as to get lost in the noise, and I think
it makes it less readable, so I'm tempted to revert.

> For the remainder, this logic should be doable
> with just 1 paddd and 1 por per vector. Merge several vectors before
> branching.

I'm afraid I don't quite see what you mean by using 1 paddd and 1 por.
The attached patch does have a slight improvement in this piece of
code, but I doubt it's what you meant.

> The double branch is inelegant. It could be removed if you either
> wrote the whole loop in asm, or split the asm block and branched in
> C. Especially if the 16bit checking is moved to a separate loop as 
> appropriate for not always needing to run it.

Split asm and branched in C.

> With 6 "r" constraints, you need #if HAVE_6REGS.

Splitting the asm also means that I'm down to 5 "r" constraints.

Bobby Bingham
-------------- next part --------------
A non-text attachment was scrubbed...
Name: flac_sse2b.patch
Type: text/x-patch
Size: 12321 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20090530/88175a42/attachment.bin>

More information about the ffmpeg-devel mailing list