[FFmpeg-devel] [PATCH] SSE2 Xvid idct

Michael Niedermayer michaelni
Sun Apr 13 23:39:43 CEST 2008


On Sun, Apr 13, 2008 at 05:25:26PM -0400, Alexander Strange wrote:
>
> On Apr 13, 2008, at 6:26 AM, Michael Niedermayer wrote:
>> On Sun, Apr 13, 2008 at 05:35:01AM -0400, Alexander Strange wrote:
>>>
>>> On Apr 12, 2008, at 8:15 AM, Michael Niedermayer wrote:
>> [...]
>>>>>   "psubsw   %%xmm6, %%xmm5          \n\t" \
>>>>>   "movdqa   "ROW0", %%xmm4          \n\t" \
>>>>>   "movdqa   "ROW4", %%xmm6          \n\t" \
>>>>>   "movdqa   %%xmm2, "spill"         \n\t" \
>>>>>   "movdqa   %%xmm4, %%xmm2          \n\t" \
>>>>>   "psubsw   %%xmm6, %%xmm4          \n\t" \
>>>>>   "paddsw   %%xmm2, %%xmm6          \n\t" \
>>>>>   "movdqa   %%xmm6, %%xmm2          \n\t" \
>>>>>   "psubsw   %%xmm7, %%xmm6          \n\t" \
>>>>>   "paddsw   %%xmm2, %%xmm7          \n\t" \
>>>>>   "movdqa   %%xmm4, %%xmm2          \n\t" \
>>>>>   "psubsw   %%xmm5, %%xmm4          \n\t" \
>>>>>   "paddsw   %%xmm2, %%xmm5          \n\t" \
>>>>>   "movdqa   %%xmm5, %%xmm2          \n\t" \
>>>>>   "psubsw   %%xmm0, %%xmm5          \n\t" \
>>>>>   "paddsw   %%xmm2, %%xmm0          \n\t" \
>>>>>   "movdqa   %%xmm4, %%xmm2          \n\t" \
>>>>>   "psubsw   %%xmm3, %%xmm4          \n\t" \
>>>>>   "paddsw   %%xmm2, %%xmm3          \n\t" \
>>>>>   "movdqa  "spill", %%xmm2          \n\t" \
>>>>
>>>> #ifdef ARCH_X86_64
>>>> # define XMMS   "%%xmm12"
>>>> #else
>>>> # define XMMS   "%%xmm2"
>>>> #endif
>>>> s/%%xmm2/XMMS/
>>>>
>>>> #ifndef ARCH_X86_64
>>>> "movdqa   %%xmm2, "spill"         \n\t" \
>>>> #endif
>>>> ...
>>>> #ifndef ARCH_X86_64
>>>> "movdqa  "spill", %%xmm2          \n\t" \
>>>> #endif
>>>>
>>>> or a
>>>> MOV_ONLY_ON32" %%xmm2, ...
>>>>
>>>>
>>>> And i think something similar can be don with ROW*
>>>
>>> Done. The row part is already optimal on 64 since pshufhw handles it.
>>
>> I meant the
>>>    "movdqa   "ROW2", %%xmm4          \n\t" \
>>>    "movdqa   "ROW6", %%xmm6          \n\t" \
>> [...]
>>>    "movdqa   "ROW0", %%xmm4          \n\t" \
>>>    "movdqa   "ROW4", %%xmm6          \n\t" \
>>
>> they are unneeded on 64.
>
> Oh, that. Done:


[...]
> ///IDCT pass on columns, assuming rows 4-6 are zero.
                                           ^
typo


[...]
>     iLLM_HEAD
>     ASMALIGN(4)
>     JNZ("%%ecx", "2f")
>     JNZ("%%eax", "3f")
>     JNZ("%%edx", "4f")
>     JNZ("%%ebx", "5f")
>     iLLM_PASS_SPARSE("%0")
>     "jmp 6f                                                      \n\t"
>     "2:                                                          \n\t"
>     iMTX_MULT("4*16(%0)", MANGLE(iTab1), "#", PUT_EVEN(ROW4))
>     "3:                                                          \n\t"
>     iMTX_MULT("5*16(%0)", MANGLE(iTab4), ROUND(walkenIdctRounders+4*16), PUT_ODD(ROW5))
>     JZ("%%edx", "1f")
>     "4:                                                          \n\t"
>     iMTX_MULT("6*16(%0)", MANGLE(iTab3), ROUND(walkenIdctRounders+5*16), PUT_EVEN(ROW6))
>     JZ("%%ebx", "1f")
>     "5:                                                          \n\t"
>     iMTX_MULT("7*16(%0)", MANGLE(iTab2), ROUND(walkenIdctRounders+5*16), PUT_ODD(ROW7))
>     iLLM_HEAD

iLLM_HEAD is executed twice here


>     iLLM_PASS("%0")
>     "6:                                                          \n\t"
>     : "+r"(block)
>     :
>     : "%eax", "%ecx", "%edx", "%ebx", "memory");

ebx + gcc + PIC -> problems

Also the changes to existing code are missing this time ...

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Thouse who are best at talking, realize last or never when they are wrong.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080413/7b45e4a8/attachment.pgp>



More information about the ffmpeg-devel mailing list