[FFmpeg-devel] [PATCH] SSE2 Xvid idct

Pascal Massimino pascal.massimino
Sun Apr 6 21:39:57 CEST 2008


  Hi,

On Sun, Apr 6, 2008 at 6:14 PM, Michael Niedermayer <michaelni at gmx.at>
wrote:

>
> > skal agreed it could be under LGPL in the last thread.
>
 yep


>
> [...]
> > #define SKIP_ROW_CHECK(src)                 \
> >     "movq     "src", %%mm0            \n\t" \
> >     "por    8+"src", %%mm0            \n\t" \
> >     "packssdw %%mm0, %%mm0            \n\t" \
> >     "movd     %%mm0, %%eax            \n\t" \
> >     "testl    %%eax, %%eax            \n\t" \
> >     "jz 1f                            \n\t"
>
> You could try to check pairs of rows, this might be faster for some rows.
> Also the code should be interleaved not form such nasty dependancy chains
> you do have enogh mmx registers.


 just a quick note: you can try doing the same with
 some 'pmovmskb mmreg, eax' instructions.
 However, this is a complex instruction and the speed gain
 is not necessarily obvious.


>
> [...]
> >     "movdqa   %%xmm2, ("dct")         \n\t" \
> >     "movdqa   %%xmm3, %%xmm2          \n\t" \
> >     "psubsw   %%xmm6, %%xmm3          \n\t" \
> >     "paddsw   %%xmm2, %%xmm6          \n\t" \
> >     "movdqa   %%xmm6, %%xmm2          \n\t" \
> >     "psubsw   %%xmm7, %%xmm6          \n\t" \
> >     "paddsw   %%xmm2, %%xmm7          \n\t" \
> >     "movdqa   %%xmm3, %%xmm2          \n\t" \
> >     "psubsw   %%xmm5, %%xmm3          \n\t" \
> >     "paddsw   %%xmm2, %%xmm5          \n\t" \
> >     "movdqa   %%xmm5, %%xmm2          \n\t" \
> >     "psubsw   %%xmm0, %%xmm5          \n\t" \
> >     "paddsw   %%xmm2, %%xmm0          \n\t" \
> >     "movdqa   %%xmm3, %%xmm2          \n\t" \
> >     "psubsw   %%xmm4, %%xmm3          \n\t" \
> >     "paddsw   %%xmm2, %%xmm4          \n\t" \
> >     "movdqa  ("dct"), %%xmm2          \n\t" \
>
> i suspect this can be written without the load/store by using
> add,add,sub buterflies (of course only if it is faster)


  iirc, i tried that and it's the same ticks count using the add,add,sub
 butterfly. Plus, i may be wrong, but i recall that the saturations used
 with the 'regular' mov,add,sub butterfly helps for nasty corner cases of
 overflow.

 I'll try and save some cycles to review the rest asap

skal




More information about the ffmpeg-devel mailing list