[FFmpeg-devel] [PATCH] move H264 IDCT to yasm
Ronald S. Bultje
Tue Sep 7 14:16:09 CEST 2010
On Mon, Sep 6, 2010 at 11:31 PM, Alexander Strange
<astrange at ithinksw.com> wrote:
> On Sep 6, 2010, at 5:00 PM, Ronald S. Bultje wrote:
>> this patch moves H264 IDCT (the LGPL part) to yasm. Performance for
>> most loopy parts is improved quite a bit because gcc is completely
>> retarded when it comes to setting up loops (I'm not joking here), some
>> up to 50%. Performance for one particular function (intra16_mmx2) is
>> mildly worse (a few cycles) and I don't quite understand why, the code
>> is identical. This might be related to alignment (gcc aligns the parts
>> that it jmps to using nops, I don't yet know how to do that in yasm),
>> otherwise I don't really know. Let me know if you want detailed
>> performance statistics for each function.
>> +cglobal h264_idct_add16intra_mmx2, 5, 7, 0
>> + ? ?xor ? ? ? ? ?r5, r5
>> +%ifdef PIC;f660-f7f9=199=256+144+9=409 (mine), theirs=1e70-2034=
Oops, removed. That's me debugging a function and forgetting to remove
I'll send a new patch that merges h264_idct_sse2.asm in there also
(i.e. ff_h264_idct_add8x4_sse2) plus the loop-functions around it),
Loren and Jason (on irc yesterday) OK'ed relicensing it to LGPL.
Thanks Loren and Jason!
More information about the ffmpeg-devel