[FFmpeg-devel] Fix VP3 IDCT on Win64
Ronald S. Bultje
Thu Aug 26 00:50:15 CEST 2010
On Wed, Aug 25, 2010 at 6:38 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
> On Wed, Aug 25, 2010 at 09:59:42PM +0200, Reimar D?ffinger wrote:
>> On Wed, Aug 25, 2010 at 07:46:35PM +0100, M?ns Rullg?rd wrote:
>> > "Ronald S. Bultje" <rsbultje at gmail.com> writes:
>> > > After this patch, it's still broken, because of put_h264_chroma_mc8
>> > > ssse3 not properly marking clobbers, but this patch 1/3rd-fixes
>> > > VP6/Win64/FATE failures (similar to how my previous patch fixed a
>> > > third of it).
>> > >
>> > > I can't fix put_h264_chroma_mc8_ssse3 because it's split over multiple
>> > > asm() statements which rely on maintaining register values between, so
>> > > I will likely rewrite that (also for VC-1/H.264/RV40) in yasm, but
>> > > that'll take a little longer... After that, fate-ea-vp60 passes
>> > > (haven't tested the others yet, because Win64 hates me).
>> > >
>> > > Ronald
>> > >
>> > > Index: libavcodec/x86/vp3dsp_sse2.c
>> > > ===================================================================
>> > > --- libavcodec/x86/vp3dsp_sse2.c ?(revision 24909)
>> > > +++ libavcodec/x86/vp3dsp_sse2.c ?(working copy)
>> > > @@ -171,6 +171,7 @@
>> > > ? ? ? ? ?VP3_1D_IDCT_SSE2(ADD8, SHIFT4)
>> > > ? ? ? ? ?PUT_BLOCK(%%xmm0, %%xmm1, %%xmm2, %%xmm3, %%xmm4, %%xmm5, %%xmm6, %%xmm7)
>> > > ? ? ? ? ?:: "r"(input_data), "r"(ff_vp3_idct_data), "m"(ff_pw_8)
>> > > + ? ? ? ?: "%xmm6", "%xmm7"
>> > > ? ? ?);
>> > > ?}
>> > Looks good to my untrained eye.
>> Assuming it works with all compilers, which would simplify the patch I proposed some
>> time ago (though listing all XMM registers 0-7 IMO would be better).
> if clobbers of sse registers are added then they should be complete and
> correct and not just the used registers amongth what win64 declares as
> callee safed
OK. Since Win32 and some BSDs hated this patch, I retract it and VP3
IDCT will have to be done in yasm as well. Blegh...
More information about the ffmpeg-devel