[FFmpeg-devel] [PATCH] move h264 loopfilter strength code to yasm
Ronald S. Bultje
Wed Sep 29 16:06:13 CEST 2010
On Tue, Sep 28, 2010 at 7:12 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
> On Mon, Sep 27, 2010 at 12:15:16PM -0400, Ronald S. Bultje wrote:
>> On Fri, Sep 24, 2010 at 9:40 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
>> > On Fri, Sep 24, 2010 at 07:33:11PM -0400, Ronald S. Bultje wrote:
>> >> Yeah, I over-enthusiastically screwed up here, sorry. First patch should still be ok,
>> >> I'll ask on gcc-list how to write a constant without the $. Without that, it'll be hard to
>> >> get the last 10 cycles off, I'm affraid...
>> > ?try %a0 and %c0 with "i" it produces a constant without $
>> > ?%n0 will produce a negated one
>> 827 dezicycles in lf-strength, 4194155 runs, 149 skips
>> \o/ (on x86-64 above, so this is in fact 3 cycles faster than what I
>> got using yasm).
>> Patches attached, passes make fate-h264 on x86-64 and x86-32. Needs
>> testing on icc and clang.
>> inlines the dir loop in h264_loop_filter_strength_mmx2() - same as
>> what I sent earlier. 60-70% of the speed increase comes from here.
>> unrolls the bidir loop inside h264_loop_filter_strength_mmx2() (the
>> only part which changes d_idx) - this is required to make d_idx a
>> constant offset rather than calculating it in-code
>> removes d_idx, makes all offsets constant - preparation for the below patches
>> removes mask_dir - minor speed increase, not really related to anything else
>> removes the edge and b_idx variable duplication, and merges all
>> expressions using these to use direct constant offsets in asm
>> off(addr,idx,size) / [off+idx*size+off]. This removes all leas in the
>> code, which contributes to the remainder of the speed increase.
>> I might not have marked all memory-clobbers correctly ("r" everywhere
>> seems to work), so review would be good here.
> all the patches look good, great work
Thanks, all applied. This needs some reindenting which I'll do in a
separate patch (or you can do it too if you beat me to it).
More information about the ffmpeg-devel