[FFmpeg-devel] [PATCH 2/2] mpegaudiodec: add SSE-optimized imdct36()
Michael Niedermayer
michaelni at gmx.at
Wed Aug 31 04:06:07 CEST 2011
On Sun, Aug 28, 2011 at 10:46:59AM +0200, Vitor Sessak wrote:
> On Sun, Aug 28, 2011 at 2:37 AM, Loren Merritt <lorenm at u.washington.edu> wrote:
> > On Sat, 27 Aug 2011, Vitor Sessak wrote:
> >
> >> %macro PSHUFD_AVX 3
> >> shufps %1, %2, %2, %3
> >> %endmacro
> >
> > This can serve as sse1 too.
>
> Fixed.
>
> >>>> %macro SWAP_64BITS 2
> >>>> %ifdef ARCH_X86_64
> >>>> SWAP %1, %2
> >>>> %endif
> >>>> %endmacro
> >>>
> >>> What good is this doing? There's no %else, so the code must also work
> >>> (with no extra instructions) if you don't swap...?
> >>
> >> I was hoping that swapping the temp variable in code like
> >>
> >> mova m5, m0
> >> addps m5, m1
> >> mulps m2, m5
> >>
> >> SWAP_64BITS m5, m10
> >>
> >> mova m5, m3
> >> addps m5, m6
> >> mulps m7, m5
> >>
> >> would allow a x32_64 CPU to use out-of-order execution to interleave
> >> the two blocks of instructions in any order.
> >
> > Unnecessary. Every x86 cpu that supports out of order execution also
> > supports register renaming.
> > Equivalently, the x86 pipeline really uses static-single-assignment, with
> > the output value of every instruction remaining available even if some
> > later instruction overwrites the same variable name.
>
> Ok, removed it.
>
> -Vitor
> libavcodec/x86/Makefile | 1
> libavcodec/x86/imdct36_sse.asm | 363 ++++++++++++++++++++++++++++++++++++++
> libavcodec/x86/mpegaudiodec_mmx.c | 12 +
> libavutil/x86/x86inc.asm | 2
> 4 files changed, 378 insertions(+)
> 969de5b59e5dfba7cfda2b080e41b72c478982d7 0002-mpegaudiodec-add-SSE-optimized-imdct36.patch
> From 0d7fb2081b572e89521e480407c86d6768f23eb8 Mon Sep 17 00:00:00 2001
> From: Vitor Sessak <vitor1001 at gmail.com>
> Date: Mon, 22 Aug 2011 07:59:46 +0200
> Subject: [PATCH 2/2] mpegaudiodec: add SSE-optimized imdct36()
patch LGTM, feel free to push it to ffmpeg git
further improvments very welcome too!
and thanks alot for the work
and thanks to loren for the review
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
Rewriting code that is poorly written but fully understood is good.
Rewriting code that one doesnt understand is a sign that one is less smart
then the original author, trying to rewrite it will not make it better.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20110831/c8e54c7e/attachment.asc>
More information about the ffmpeg-devel
mailing list