[FFmpeg-devel] [PATCH 1/5] x264asm: extend SBUTTERFLY to support SSE1
Michael Niedermayer
michaelni at gmx.at
Mon Apr 8 14:16:07 CEST 2013
On Mon, Apr 08, 2013 at 10:04:33AM +0200, Christophe Gisquet wrote:
> 2013/4/7 Christophe Gisquet <christophe.gisquet at gmail.com>:
> > The other solution is probably to have 2 paths depending on %ifidn %2, %3
>
> Here's a proposal for this. I haven't strongly tested it (just ran
> fate-aac without my other patches) because, in short, I can't at the
> moment.
>
> And for the wd unpacks, I guess there is probably too much shuffling
> and shifting to do to add support for SSE1, as it kills opportunities
> for better scheduling.
>
> --
> Christophe
> x86util.asm | 32 +++++++++++++++++++++++++++++++-
> 1 file changed, 31 insertions(+), 1 deletion(-)
> 4cc1ffd78cd5800397de610dc00dfbd3cdee5026 0001-x264asm-SBUTTERFLY-SSE1-and-identical-args.patch
> From a768a9352ddde88e99f6b729b70fdddc20297f5c Mon Sep 17 00:00:00 2001
> From: Christophe Gisquet <christophe.gisquet at gmail.com>
> Date: Mon, 8 Apr 2013 09:42:26 +0200
> Subject: [PATCH] x264asm: SBUTTERFLY: SSE1 and identical args
>
> SSE1 now supports dq and qdq types of unpacking.
> Also, the output when %2 == %3 is now correct, and %3 == %4 generates an
> error.
> ---
> libavutil/x86/x86util.asm | 32 +++++++++++++++++++++++++++++++-
> 1 files changed, 31 insertions(+), 1 deletions(-)
>
> diff --git a/libavutil/x86/x86util.asm b/libavutil/x86/x86util.asm
> index 79a023f..5adb80c 100644
> --- a/libavutil/x86/x86util.asm
> +++ b/libavutil/x86/x86util.asm
> @@ -30,10 +30,40 @@
> %include "libavutil/x86/x86inc.asm"
>
> %macro SBUTTERFLY 4
> -%if avx_enabled == 0
> +%ifidn %3, %4
> + %error Third and fourth arguments must be different
> +%endif
> +%if notcpuflag(sse2) && mmsize == 16
> + %ifidn %1, dq
> + mova m%4, m%2
> + %ifidn %2, %3
> + unpcklps m%2, m%3
> + unpckhps m%4, m%3
> + %else
> + unpckhps m%4, m%3
this looks flipped
> + unpcklps m%2, m%3
> + %endif
> + %elifdn %1, qdq
> mova m%4, m%2
> + %ifidn %2, %3
> + shufps m%2, m%3, q1010
> + shufps m%4, m%3, q3232
> + %else
> + shufps m%4, m%3, q1010
this too looks like the 2 alternatives are fliped
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
When you are offended at any man's fault, turn to yourself and study your
own failings. Then you will forget your anger. -- Epictetus
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20130408/e9f3f7f0/attachment.asc>
More information about the ffmpeg-devel
mailing list