[FFmpeg-devel] [PATCH] MMX implementation of VC-1 inverse transforms
Christophe GISQUET
christophe.gisquet
Sun Jan 20 13:27:04 CET 2008
Michael Niedermayer a ?crit :
>> + asm volatile (
>> + "movd %2, %%mm0 \n\t"
>> + "movd %3, %%mm1 \n\t"
>> + "punpcklwd %%mm0, %%mm0 \n\t"
>> + "punpcklwd %%mm1, %%mm1 \n\t"
>> + "punpckldq %%mm0, %%mm0 \n\t"
>> + "punpckldq %%mm1, %%mm1 \n\t"
>> + "movq %%mm0, %0 \n\t"
>> + "movq %%mm1, %1 \n\t"
>> + : "+m"(mm_rnd1), "+m"(mm_rnd2)
>> + : "m"(rnd1), "m"(rnd2)
>> + );
>
> as rnd1 and 2 as well as shift are constants, building these in the inner
> loops is completely unnacceptable, you should pass int64_t arguments
Will do.
> you should at least do
>> + "movq (%0,"OFF"), %%mm0 \n\t" \
>> + "psubw %%mm0, %%mm1 \n\t"
>> + "psubw %%mm0, %%mm4 \n\t"
>> + "psllw $2, %%mm0 \n\t"
>> + "psubw %%mm0, %%mm2 \n\t"
>> + "paddw %%mm0, %%mm0 \n\t"
>> + "psubw %%mm0, %%mm4 \n\t"
>> + "paddw %%mm0, %%mm0 \n\t"
>> + "psubw %%mm0, %%mm3 \n\t"
>> + "paddw %%mm0, %%mm1 \n\t"
>
> 2 instructions less, 3 registers less, no multiply, no constants read
Merging with the needed preshift, it's akin to writing (for instance):
t1 = 8 * src[1] + 8 * src[3] + 4 * src[5] + 2 * src[7]
+ (src[5] - src[3]) >> 1;
>> + : "r"(off), "r"(3*off), "r"(5*off), "r"(7*off),
>
> unneeded wasting of 4 registers to load a constant
> and resulting more complex and slower addressing
This I'm not sure how to handle. My goal was to make a function of the
1d dct8, and 'off' depends on what transform (8x8, 8x4, 4x8) uses that
function.
It is indeed well known what the value of 'off' is, but to really use
it, I would have to change the 1d dct8 to a macro, potentially
increasing object size if the function wasn't already inlined.
Is this what you want, or do I miss an intermediate solution?
Best regards,
--
Christophe GISQUET
More information about the ffmpeg-devel
mailing list