[FFmpeg-devel] Fix VP3 IDCT on Win64

Vitor Sessak vitor1001
Thu Aug 26 02:01:45 CEST 2010


Ronald S. Bultje wrote:
> Hi,
> 
> On Wed, Aug 25, 2010 at 6:38 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
>> On Wed, Aug 25, 2010 at 09:59:42PM +0200, Reimar D?ffinger wrote:
>>> On Wed, Aug 25, 2010 at 07:46:35PM +0100, M?ns Rullg?rd wrote:
>>>> "Ronald S. Bultje" <rsbultje at gmail.com> writes:
>>>>> After this patch, it's still broken, because of put_h264_chroma_mc8
>>>>> ssse3 not properly marking clobbers, but this patch 1/3rd-fixes
>>>>> VP6/Win64/FATE failures (similar to how my previous patch fixed a
>>>>> third of it).
>>>>>
>>>>> I can't fix put_h264_chroma_mc8_ssse3 because it's split over multiple
>>>>> asm() statements which rely on maintaining register values between, so
>>>>> I will likely rewrite that (also for VC-1/H.264/RV40) in yasm, but
>>>>> that'll take a little longer... After that, fate-ea-vp60 passes
>>>>> (haven't tested the others yet, because Win64 hates me).
>>>>>
>>>>> Ronald
>>>>>
>>>>> Index: libavcodec/x86/vp3dsp_sse2.c
>>>>> ===================================================================
>>>>> --- libavcodec/x86/vp3dsp_sse2.c  (revision 24909)
>>>>> +++ libavcodec/x86/vp3dsp_sse2.c  (working copy)
>>>>> @@ -171,6 +171,7 @@
>>>>>          VP3_1D_IDCT_SSE2(ADD8, SHIFT4)
>>>>>          PUT_BLOCK(%%xmm0, %%xmm1, %%xmm2, %%xmm3, %%xmm4, %%xmm5, %%xmm6, %%xmm7)
>>>>>          :: "r"(input_data), "r"(ff_vp3_idct_data), "m"(ff_pw_8)
>>>>> +        : "%xmm6", "%xmm7"
>>>>>      );
>>>>>  }
>>>> Looks good to my untrained eye.
>>> Assuming it works with all compilers, which would simplify the patch I proposed some
>>> time ago (though listing all XMM registers 0-7 IMO would be better).
>> if clobbers of sse registers are added then they should be complete and
>> correct and not just the used registers amongth what win64 declares as
>> callee safed
> 
> OK. Since Win32 and some BSDs hated this patch, I retract it and VP3
> IDCT will have to be done in yasm as well. Blegh...

And what would you do with code that can not be translated to yasm (like 
asm that supposed to be inlined in the middle of a C function)? Or there 
is no such code that causes crashes in win64? Would using a macro to 
only clobber the mmx regs on the configurations that actually need it be 
a solution?

-Vitor




More information about the ffmpeg-devel mailing list