[Ffmpeg-devel] Re: [PATCH] Machine endian bytestream functions
Michael Niedermayer
michaelni
Sun Mar 11 21:36:58 CET 2007
Hi
On Sat, Mar 10, 2007 at 05:15:44PM -0300, Ramiro Polla wrote:
> Hello,
>
> Reimar D?ffinger escreveu:
> >Hello,
> >On Sat, Mar 10, 2007 at 11:06:41PM -0300, ramiro at lisha.ufsc.br wrote:
> >
> >>Attached patch makes the AV_{R,W}{L,B}xx macros have a machine endian for
> >>the simple 16 and 32 bit types. Those macros are then #ifdef'd for the
> >>correct endianess. 24 bit remains the same, as it would be more complex.
> >>
> >
> >They completely ignore alignment issues...
> >
> >
> You're right.
>
> Attached patch makes use of machine endianess where unaligned data
> accesses are possible, and faster than what gcc is currently doing.
>
> I have only tested this on a p4, but the following program should detect
> this on any architecture. Compile bytes.c and main.c with the same
> options FFmpeg gives to libavcodec files, link them, and test the speed
> both for patched and unpatched FFmpeg. bytes.c should be changed to 'be'
> on big-endian architectures.
>
> Regression tests pass.
>
> Ramiro Polla
> Index: configure
> ===================================================================
> --- configure (revis?o 8316)
> +++ configure (c?pia de trabalho)
> @@ -602,6 +602,7 @@
> dlopen
> fast_64bit
> fast_cmov
> + fast_unaligned
> freetype2
> imlib2
> inet_aton
> @@ -737,6 +738,7 @@
> mmx="default"
> cmov="no"
> fast_cmov="no"
> +fast_unaligned="no"
> armv5te="default"
> armv6="default"
> iwmmxt="default"
> @@ -951,9 +953,11 @@
> case "$arch" in
> i386|i486|i586|i686|i86pc|BePC)
> arch="x86_32"
> + enable fast_unaligned
> ;;
> x86_64|amd64)
> arch="x86_32"
> + enable fast_unaligned
> canon_arch="`$cc -dumpmachine | sed -e 's,\([^-]*\)-.*,\1,'`"
> if [ x"$canon_arch" = x"x86_64" -o x"$canon_arch" = x"amd64" ]; then
> if [ -z "`echo $CFLAGS | grep -- -m32`" ]; then
maybe configure should rather have a generic test which checks which version
is faster? (it would be much easier to maintain instead of keeping track what
is faster for which cpu ...)
> Index: libavutil/intreadwrite.h
> ===================================================================
> --- libavutil/intreadwrite.h (revis?o 8316)
> +++ libavutil/intreadwrite.h (c?pia de trabalho)
> @@ -47,8 +47,8 @@
> #define AV_RB8(x) (((uint8_t*)(x))[0])
> #define AV_WB8(p, d) { ((uint8_t*)(p))[0] = (d); }
>
> -#define AV_RB16(x) ((((uint8_t*)(x))[0] << 8) | ((uint8_t*)(x))[1])
> -#define AV_WB16(p, d) { \
> +#define AV_RBE16(x) ((((uint8_t*)(x))[0] << 8) | ((uint8_t*)(x))[1])
> +#define AV_WBE16(p, d) { \
cosmetic, rejected
[...]
>
> +#ifdef HAVE_FAST_UNALIGNED
> +# ifdef WORDS_BIGENDIAN
> +# define AV_RL16(x) AV_RLE16(x)
> +# define AV_RL32(x) AV_RLE32(x)
> +# define AV_RB16(x) LD16(x)
> +# define AV_RB32(x) LD32(x)
> +# define AV_WL16(p, d) AV_WLE16(p, d)
> +# define AV_WL32(p, d) AV_WLE32(p, d)
> +# define AV_WB16(p, d) ST16(p, d)
> +# define AV_WB32(p, d) ST32(p, d)
> +# else
> +# define AV_RL16(x) LD16(x)
> +# define AV_RL32(x) LD32(x)
> +# define AV_RB16(x) AV_RBE16(x)
> +# define AV_RB32(x) AV_RBE32(x)
> +# define AV_WL16(p, d) ST16(p, d)
> +# define AV_WL32(p, d) ST32(p, d)
> +# define AV_WB16(p, d) AV_WBE16(p, d)
> +# define AV_WB32(p, d) AV_WBE32(p, d)
> +# endif
> +#else
> +# define AV_RL16(x) AV_RLE16(x)
> +# define AV_RL32(x) AV_RLE32(x)
> +# define AV_RB16(x) AV_RBE16(x)
> +# define AV_RB32(x) AV_RBE32(x)
> +# define AV_WL16(p, d) AV_WLE16(p, d)
> +# define AV_WL32(p, d) AV_WLE32(p, d)
> +# define AV_WB16(p, d) AV_WBE16(p, d)
> +# define AV_WB32(p, d) AV_WBE32(p, d)
> +#endif
#ifdef HAVE_FAST_UNALIGNED
# ifdef WORDS_BIGENDIAN
define AV_RB16(x) LD16(x)
define AV_RB32(x) LD32(x)
define AV_WB16(p, d) ST16(p, d)
define AV_WB32(p, d) ST32(p, d)
# else
define AV_RL16(x) LD16(x)
define AV_RL32(x) LD32(x)
define AV_WL16(p, d) ST16(p, d)
define AV_WL32(p, d) ST32(p, d)
# endif
#endif
also it might be worth to check bswap(LD32(x)) style too ...
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
When you are offended at any man's fault, turn to yourself and study your
own failings. Then you will forget your anger. -- Epictetus
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20070311/08d0027a/attachment.pgp>
More information about the ffmpeg-devel
mailing list