[FFmpeg-devel] [PATCH] move h264 loopfilter strength code to yasm

Måns Rullgård mans
Wed Sep 29 21:53:37 CEST 2010


"Ronald S. Bultje" <rsbultje at gmail.com> writes:

> Hi,
>
> On Tue, Sep 28, 2010 at 7:12 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
>> On Mon, Sep 27, 2010 at 12:15:16PM -0400, Ronald S. Bultje wrote:
>>> On Fri, Sep 24, 2010 at 9:40 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
>>> > On Fri, Sep 24, 2010 at 07:33:11PM -0400, Ronald S. Bultje wrote:
>>> >> Yeah, I over-enthusiastically screwed up here, sorry. First patch should still be ok,
>>> >> I'll ask on gcc-list how to write a constant without the $. Without that, it'll be hard to
>>> >> get the last 10 cycles off, I'm affraid...
>>> >
>>> > ?try %a0 and %c0 with "i" it produces a constant without $
>>> > ?%n0 will produce a negated one
>>>
>>> 827 dezicycles in lf-strength, 4194155 runs, 149 skips
>>>
>>> \o/ (on x86-64 above, so this is in fact 3 cycles faster than what I
>>> got using yasm).
>>>
>>> Patches attached, passes make fate-h264 on x86-64 and x86-32. Needs
>>> testing on icc and clang.
>>>
>>> fix-lfstrength-inline-asm.patch
>>> inlines the dir loop in h264_loop_filter_strength_mmx2() - same as
>>> what I sent earlier. 60-70% of the speed increase comes from here.
>>>
>>> fix-lfstrength-unrollloop.patch
>>> unrolls the bidir loop inside h264_loop_filter_strength_mmx2() (the
>>> only part which changes d_idx) - this is required to make d_idx a
>>> constant offset rather than calculating it in-code
>>>
>>> fix-lfstrength-removevars.patch
>>> removes d_idx, makes all offsets constant - preparation for the below patches
>>>
>>> fix-lfstrength-removemask.patch
>>> removes mask_dir - minor speed increase, not really related to anything else
>>>
>>> fix-lfstrength-remove-edgevar.patch
>>> removes the edge and b_idx variable duplication, and merges all
>>> expressions using these to use direct constant offsets in asm
>>> off(addr,idx,size) / [off+idx*size+off]. This removes all leas in the
>>> code, which contributes to the remainder of the speed increase.
>>>
>>> I might not have marked all memory-clobbers correctly ("r" everywhere
>>> seems to work), so review would be good here.
>>
>> all the patches look good, great work
>
> Thanks, all applied. This needs some reindenting which I'll do in a
> separate patch (or you can do it too if you beat me to it).

One or more of them broke suncc.

-- 
M?ns Rullg?rd
mans at mansr.com



More information about the ffmpeg-devel mailing list