[FFmpeg-devel] [PATCH] VC-1 MMX DSP functions

Michael Niedermayer michaelni
Sun Nov 18 18:04:20 CET 2007


On Sun, Nov 18, 2007 at 05:20:35PM +0100, Christophe GISQUET wrote:
> Michael Niedermayer a ?crit :
> > On Sat, Nov 17, 2007 at 12:33:31PM +0100, Christophe GISQUET wrote:
> >> +#define SHIFT2_16B_END_LINE(R)                  \
> >> +    "psraw     %5, %%mm"#R"           \n\t"     \
> >> +    "movq      %%mm"#R", (%2)         \n\t"     \
> >> +    "add       %3, %1                 \n\t"     \
> >> +    "add       $24, %2                \n\t"
> > 
> > the $24 add can be avoided by using a offset for the movq above
> 
> Applied. Also made me see I didn't use SHIFT2_8B_END_LINE macro.

if theres just one left then the code can be simplified by not passing
SHIFT2_16B_END_LINE as argument


[...]
> >> +     "movq      %%mm1, %%mm3    \n\t"                      \
> >> +     "movq      %%mm2, %%mm4    \n\t"                      \
> >> +     "paddw     %%mm1, %%mm1    \n\t"                      \
> >> +     "paddw     %%mm2, %%mm2    \n\t"                      \
> >> +     "paddw     %%mm3, %%mm1    \n\t" /* 3* */             \
> >> +     "paddw     %%mm4, %%mm2    \n\t" /* 3* */             \
> > 
> > have you checked that pmullw with 3 is not faster?
> 
> It only improves the horizontal pass (2550 vs 2700 dezicycles ie 5%).
> Other seem improved too, but by less than 1%.
> 
> There are 2 reasons why I didn't want to use pmullw as much as possible:
> - here, I couldn't load the factor in a register (seems less speed
> critical than in my recollection)
> - I have a core2 and an Athlon computers; both have a latency for pmullw
> of 3; I think some P4 have a latency of 6.

to be honest, IMHO the P4 is a failure design wise and it might be better
not to give too much weight to the P4 in optimization decissions

though of course P4 benchmarks would still be interresting maybe its
not slower at all

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Observe your enemies, for they first find out your faults. -- Antisthenes
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20071118/c8d0bcbe/attachment.pgp>



More information about the ffmpeg-devel mailing list