[FFmpeg-devel] [PATCH] x86/swr: make int32_to_int32 un/pack_2ch functions SSE
Ronald S. Bultje
rsbultje at gmail.com
Thu Jan 15 12:53:22 CET 2015
On Wed, Jan 14, 2015 at 4:23 PM, James Almer <jamrial at gmail.com> wrote:
> On 14/01/15 1:59 PM, Michael Niedermayer wrote:
> > On Wed, Jan 14, 2015 at 01:53:48AM -0300, James Almer wrote:
> >> unpack_2ch is already using sse float ops only, and pack_2ch is a
> trivial change.
> >> Rename both to float_to_float for consistency.
> >> Signed-off-by: James Almer <jamrial at gmail.com>
> >> ---
> >> libswresample/x86/audio_convert.asm | 14 ++++++++------
> >> libswresample/x86/audio_convert_init.c | 11 +++++++----
> >> 2 files changed, 15 insertions(+), 10 deletions(-)
> >> diff --git a/libswresample/x86/audio_convert.asm
> >> index 1617e0b..c13c26f 100644
> >> --- a/libswresample/x86/audio_convert.asm
> >> +++ b/libswresample/x86/audio_convert.asm
> >> @@ -60,8 +60,8 @@ pack_2ch_%2_to_%1_u_int %+ SUFFIX
> >> punpcklwd m0, m2
> >> punpckhwd m1, m2
> >> %else
> >> - punpckldq m0, m2
> >> - punpckhdq m1, m2
> >> + unpcklps m0, m2
> >> + unpckhps m1, m2
> >> %endif
> >> %6 m0,m1,m2,m3,m4,m5
> >> %else
> > did you benchmark this ?
> > ive just checked and on Pentium M, Core Solo and Core Duo these are
> > listed as having only 1/5 the throughput
> > on sandybridge they are still listed with half the throughput than
> > their integer counterparts
> > i didnt benchmark it though
> No, i didn't benchmark. And you're right, even on recent CPUs they seem to
> have half the throughput as the integer counterparts.
> Do you think it will mean a considerable performance hit? These functions
> aren't even that important in audio processing anyway (perf shows they
> represent less than 1% of total cpu time when doing pcm -> pcm).
> Nonetheless, considering this maybe the other functions should be changed
> to not use SBUTTERFLYPS.
Well, you can have SSE and SSE2 versions, right? That way it works for
people with super-old CPUs (if you care about them), but still gives the
better performance for 99.9% of the (x86) world with SSE2 or higher CPUs.
More information about the ffmpeg-devel