[FFmpeg-devel] [PATCH] dsputil: add bswap16_buf()
Frank Barchard
fbarchard
Mon Jun 14 21:48:52 CEST 2010
On Mon, Jun 14, 2010 at 5:15 AM, M?ns Rullg?rd <mans at mansr.com> wrote:
> Michael Niedermayer <michaelni at gmx.at> writes:
>
> > On Sun, Jun 13, 2010 at 05:59:20PM +0100, Mans Rullgard wrote:
> >> ---
> >> libavcodec/dsputil.c | 7 +++++++
> >> libavcodec/dsputil.h | 1 +
> >> 2 files changed, 8 insertions(+), 0 deletions(-)
> >>
> >> diff --git a/libavcodec/dsputil.c b/libavcodec/dsputil.c
> >> index 0701324..1ecd73f 100644
> >> --- a/libavcodec/dsputil.c
> >> +++ b/libavcodec/dsputil.c
> >> @@ -260,6 +260,12 @@ static void bswap_buf(uint32_t *dst, const uint32_t
> *src, int w){
> >> }
> >> }
> >>
> >> +static void bswap16_buf(uint16_t *dst, const uint16_t *src, int len)
> >> +{
> >> + while (len--)
> >> + *dst++ = bswap_16(*src++);
> >> +}
> >
> > on 64bit arch this is likely faster:
> >
> > uint64_t u= ((uint64_t*)src)[i];
> > ((uint64_t*)dst)[i]= ((u>>8)&0x...) + ((u<<8)&0x...)
>
> That depends entirely on the specifics of the machine, but point taken.
> It would also in general require 8-byte alignment. Not sure if that
> can be guaranteed.
>
> The purpose of having this in dsputil is to allow optimised
> implementations for specific machines. A 32-bit block bswap function
> already exists, and some patch posted was doing a 16-bit block swap.
> If one makes sense, so does the other.
>
With sse2 you could swap 8 pairs at a time with psllw/pslr/por which may be
faster than pshufb on Atom.
And Neon can do a vector load, vector store to get the equivalent.
More information about the ffmpeg-devel
mailing list