[FFmpeg-devel] [PATCH] SSE2 Xvid idct

Michael Niedermayer michaelni
Tue Apr 15 00:47:50 CEST 2008


On Mon, Apr 14, 2008 at 06:12:05PM -0400, Alexander Strange wrote:
>
> On Apr 13, 2008, at 10:26 PM, Michael Niedermayer wrote:
>> On Sun, Apr 13, 2008 at 10:10:21PM -0400, Alexander Strange wrote:
>>> On Sun, Apr 13, 2008 at 5:39 PM, Michael Niedermayer <michaelni at gmx.at> 
>>> wrote:
>>>> [..]
>>>>>>>>
>>>>>>>> #ifdef ARCH_X86_64
>>>>>>>> # define XMMS   "%%xmm12"
>>>>>>>> #else
>>>>>>>> # define XMMS   "%%xmm2"
>>>>>>>> #endif
>>>>>>>> s/%%xmm2/XMMS/
>>>>>>>>
>>>>>>>> #ifndef ARCH_X86_64
>>>>>>>> "movdqa   %%xmm2, "spill"         \n\t" \
>>>>>>>> #endif
>>>>>>>> ...
>>>>>>>> #ifndef ARCH_X86_64
>>>>>>>> "movdqa  "spill", %%xmm2          \n\t" \
>>>>>>>> #endif
>>>>>>>>
>>>>>>>> or a
>>>>>>>> MOV_ONLY_ON32" %%xmm2, ...
>>>>>>>>
>>>>>>>>
>>>>>>>> And i think something similar can be don with ROW*
>>>>>>>
>>>>>>> Done. The row part is already optimal on 64 since pshufhw handles it.
>>>>>>
>>>>>> I meant the
>>>>>>>   "movdqa   "ROW2", %%xmm4          \n\t" \
>>>>>>>   "movdqa   "ROW6", %%xmm6          \n\t" \
>>>>>> [...]
>>>>>>>   "movdqa   "ROW0", %%xmm4          \n\t" \
>>>>>>>   "movdqa   "ROW4", %%xmm6          \n\t" \
>>>>>>
>>>>>> they are unneeded on 64.
>>>>>
>>>>> Oh, that. Done:
>>>>
>>>>
>>>> [...]
>>>>> ///IDCT pass on columns, assuming rows 4-6 are zero.
>>>>                                           ^
>>>> typo
>>>
>>> Fixed.
>>>
>>>> [...]
>>>>>    iLLM_HEAD
>>>>>    ASMALIGN(4)
>>>>>    JNZ("%%ecx", "2f")
>>>>>    JNZ("%%eax", "3f")
>>>>>    JNZ("%%edx", "4f")
>>>>>    JNZ("%%ebx", "5f")
>>>>>    iLLM_PASS_SPARSE("%0")
>>>>>    "jmp 6f                                                      \n\t"
>>>>>    "2:                                                          \n\t"
>>>>>    iMTX_MULT("4*16(%0)", MANGLE(iTab1), "#", PUT_EVEN(ROW4))
>>>>>    "3:                                                          \n\t"
>>>>>    iMTX_MULT("5*16(%0)", MANGLE(iTab4), ROUND(walkenIdctRounders+4*16), 
>>>>> PUT_ODD(ROW5))
>>>>>    JZ("%%edx", "1f")
>>>>>    "4:                                                          \n\t"
>>>>>    iMTX_MULT("6*16(%0)", MANGLE(iTab3), ROUND(walkenIdctRounders+5*16), 
>>>>> PUT_EVEN(ROW6))
>>>>>    JZ("%%ebx", "1f")
>>>>>    "5:                                                          \n\t"
>>>>>    iMTX_MULT("7*16(%0)", MANGLE(iTab2), ROUND(walkenIdctRounders+5*16), 
>>>>> PUT_ODD(ROW7))
>>>>>    iLLM_HEAD
>>>>
>>>> iLLM_HEAD is executed twice here
>>>
>>> That's intentional, it turned out to be the best way to handle it on
>>> 32-bit. (call it a speculative prefetch)
>>> But we can get rid of it for x86-64, so I did.
>>>
>>>>>    iLLM_PASS("%0")
>>>>>    "6:                                                          \n\t"
>>>>>    : "+r"(block)
>>>>>    :
>>>>>    : "%eax", "%ecx", "%edx", "%ebx", "memory");
>>>>
>>>> ebx + gcc + PIC -> problems
>>>>
>>>> Also the changes to existing code are missing this time ...
>>>
>>> changed to esi
>>> The others hadn't changed and I didn't want to repost them every time...
>>
>> looks ok
>
> Thanks. Here's all the patches again, could someone apply them?

send username & password to diego and apply them yourself :)
(of course only if you agree to our policy/coding/svn rules)

PS: you should do something about the MIME type of your attached files.

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Democracy is the form of government in which you can choose your dictator
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080415/5cb7ea84/attachment.pgp>



More information about the ffmpeg-devel mailing list