[FFmpeg-devel] [PATCH 1/4] avcodec/x86: add put_pixels16_x2_sse2
Michael Niedermayer
michaelni at gmx.at
Sun Feb 3 21:54:39 CET 2013
On Sun, Feb 03, 2013 at 11:30:56AM -0800, Ronald S. Bultje wrote:
> Hi,
>
> On Sun, Feb 3, 2013 at 7:31 AM, Michael Niedermayer <michaelni at gmx.at> wrote:
> > about 1% faster P frame motion compensation for matrixbench on i7
> >
> > Signed-off-by: Michael Niedermayer <michaelni at gmx.at>
> > ---
> > libavcodec/x86/dsputil_mmx.c | 4 ++++
> > libavcodec/x86/hpeldsp.asm | 31 ++++++++++++++++++++++++++++++-
> > 2 files changed, 34 insertions(+), 1 deletion(-)
> >
> > diff --git a/libavcodec/x86/dsputil_mmx.c b/libavcodec/x86/dsputil_mmx.c
> > index 2e8300a..29d87a1 100644
> > --- a/libavcodec/x86/dsputil_mmx.c
> > +++ b/libavcodec/x86/dsputil_mmx.c
> > @@ -1523,6 +1523,8 @@ static void gmc_mmx(uint8_t *dst, uint8_t *src,
> >
> > void ff_put_pixels16_sse2(uint8_t *block, const uint8_t *pixels,
> > int line_size, int h);
> > +void ff_put_pixels16_x2_sse2(uint8_t *block, const uint8_t *pixels,
> > + int line_size, int h);
> > void ff_avg_pixels16_sse2(uint8_t *block, const uint8_t *pixels,
> > int line_size, int h);
> >
> > @@ -2034,6 +2036,8 @@ static void dsputil_init_sse2(DSPContext *c, AVCodecContext *avctx,
> > // these functions are slower than mmx on AMD, but faster on Intel
> > if (!high_bit_depth) {
> > c->put_pixels_tab[0][0] = ff_put_pixels16_sse2;
> > + c->put_pixels_tab[0][1] = ff_put_pixels16_x2_sse2;
> > +
> > c->put_no_rnd_pixels_tab[0][0] = ff_put_pixels16_sse2;
> > c->avg_pixels_tab[0][0] = ff_avg_pixels16_sse2;
> > }
> > diff --git a/libavcodec/x86/hpeldsp.asm b/libavcodec/x86/hpeldsp.asm
> > index 7f0c285..81b6901 100644
> > --- a/libavcodec/x86/hpeldsp.asm
> > +++ b/libavcodec/x86/hpeldsp.asm
> > @@ -2,7 +2,7 @@
> > ;*
> > ;* Copyright (c) 2000-2001 Fabrice Bellard <fabrice at bellard.org>
> > ;* Copyright (c) Nick Kurshev <nickols_k at mail.ru>
> > -;* Copyright (c) 2002 Michael Niedermayer <michaelni at gmx.at>
> > +;* Copyright (c) 2002-2013 Michael Niedermayer <michaelni at gmx.at>
> > ;* Copyright (c) 2002 Zdenek Kabelac <kabi at informatics.muni.cz>
> > ;* Copyright (c) 2013 Daniel Kang
> > ;*
> > @@ -513,3 +513,32 @@ cglobal avg_pixels16, 4,5,4
> > lea r0, [r0+r2*4]
> > jnz .loop
> > REP_RET
> > +
> > +; put_pixels16_x2(uint8_t *block, const uint8_t *pixels, int line_size, int h)
> > +cglobal put_pixels16_x2, 4, 5, 4
> > + movsxdifnidn r2, r2d
> > + lea r4, [r2*2]
> > +.loop:
> > + movu m0, [r1]
> > + movu m1, [r1+r2]
> > + movu m2, [r1+1]
> > + movu m3, [r1+r2+1]
> > + pavgb m0, m2
> > + pavgb m1, m3
> > + mova [r0], m0
> > + mova [r0+r2], m1
> > + add r1, r4
> > + add r0, r4
> > + movu m0, [r1]
> > + movu m1, [r1+r2]
> > + movu m2, [r1+1]
> > + movu m3, [r1+r2+1]
> > + pavgb m0, m2
> > + pavgb m1, m3
> > + add r1, r4
> > + mova [r0], m0
> > + mova [r0+r2], m1
> > + add r0, r4
> > + sub r3d, 4
> > + jne .loop
> > + REP_RET
>
> I bet that this code is identical to the 8-pixel mmx/mmx2 version. How
> about you extend that version to use %+ mmsize in the cglobal line,
> and then INIT_MMX mmx callmacro INIT_XMM sse2 callmacro so you use the
> same macro for both versions = smaller and more maintainable source
> code size?
it differs in pavgb, the mmx2 version uses pavgb so that it does
unaligned reads. mmx2/3dnow supports that, sse2 requires aligned
addresses in pavgb so it requires different code.
To combine the 2 functions, would either require a bunch of if/else
or a macro for a unaligned pavgb, later would restrict the instruction
order though
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
Awnsering whenever a program halts or runs forever is
On a turing machine, in general impossible (turings halting problem).
On any real computer, always possible as a real computer has a finite number
of states N, and will either halt in less than N cycles or never halt.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20130203/408a641f/attachment.asc>
More information about the ffmpeg-devel
mailing list