[FFmpeg-devel] [PATCH] move h264 loopfilter strength code to yasm
Wed Sep 29 01:12:58 CEST 2010
On Mon, Sep 27, 2010 at 12:15:16PM -0400, Ronald S. Bultje wrote:
> On Fri, Sep 24, 2010 at 9:40 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
> > On Fri, Sep 24, 2010 at 07:33:11PM -0400, Ronald S. Bultje wrote:
> >> Yeah, I over-enthusiastically screwed up here, sorry. First patch should still be ok,
> >> I'll ask on gcc-list how to write a constant without the $. Without that, it'll be hard to
> >> get the last 10 cycles off, I'm affraid...
> > ?try %a0 and %c0 with "i" it produces a constant without $
> > ?%n0 will produce a negated one
> 827 dezicycles in lf-strength, 4194155 runs, 149 skips
> \o/ (on x86-64 above, so this is in fact 3 cycles faster than what I
> got using yasm).
> Patches attached, passes make fate-h264 on x86-64 and x86-32. Needs
> testing on icc and clang.
> inlines the dir loop in h264_loop_filter_strength_mmx2() - same as
> what I sent earlier. 60-70% of the speed increase comes from here.
> unrolls the bidir loop inside h264_loop_filter_strength_mmx2() (the
> only part which changes d_idx) - this is required to make d_idx a
> constant offset rather than calculating it in-code
> removes d_idx, makes all offsets constant - preparation for the below patches
> removes mask_dir - minor speed increase, not really related to anything else
> removes the edge and b_idx variable duplication, and merges all
> expressions using these to use direct constant offsets in asm
> off(addr,idx,size) / [off+idx*size+off]. This removes all leas in the
> code, which contributes to the remainder of the speed increase.
> I might not have marked all memory-clobbers correctly ("r" everywhere
> seems to work), so review would be good here.
all the patches look good, great work
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
If you really think that XML is the answer, then you definitly missunderstood
the question -- Attila Kinali
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 198 bytes
Desc: Digital signature
More information about the ffmpeg-devel