[Ffmpeg-devel] patch: altivec optimizations for h264 decoder
Michael Niedermayer
michaelni
Mon Feb 6 13:39:33 CET 2006
Hi
On Mon, Feb 06, 2006 at 11:24:14AM +0100, Mauricio Alvarez wrote:
> Hi all
>
> As a part of my academic research on architectures for video
> decoding I am doing some optimizations to the h.264 decoder for the ppc
> architecture using altivec, and I want to submit them back to the ffmpeg
> project.
>
> I have implemented the following functions:
> - luma motion compensation for 8x8 and 4x4 pixels blocks
> - chroma motion compensation for 4x4 pixel blocks
> - inverse transforms: 8x8 and 4x4
>
> i) for the 4x4 inverse transform I have implemented two versions: the
> first one, called ff_h264_idct_add_altivec, implements the transform
> with the same algorith as the c version. The second one is
> ff_h264_idct_add_altivec_mat which implements an optimized matrix
> multiply algorithm described in Chen paper [1]. In the altivec
> implementation the second (matrix) algorithm has a speed-up of 2.95 with
> respect to the C version while the first version has 1.55.
>
> ii)The 8x8 luma motion compensation implementation with altivec has a
> 2.12 speed-up compared with the C version and the 4x4 has 1.30.
>
> iii)The chroma 4x4 motion compensation has a speed-up of 1.85 again
> compared with the C version.
>
> iv) I have performed a regresion test and the new optimizations passed
> it ok. Also I have decocoded some videos[2] coded with the JM and x264
> encoders at HD resolution and all of them decode well.
> The speed-ups for the sequences used is described in the next table:
>
> Coding options:
> - resolution: 1920x1088p25,
> - profile: main, level: 5.0
> - qp for I,P slices: 22
> - qp for B slices: 24
> - coded sequence: I-P-B-B-P-B-B
> - direct mode: temporal
> - Weighted prediction
>
>
> sequence ffmpeg-cvs ffmpeg-patch
> time [s] time [s] speed-up
> pedestrian 11,89 10,15 17,14 %
> riverbed 19,11 17,73 7,78 %
> blue sky 11,33 10,13 11,85 %
> rush hour 12,34 11,24 9,79 %
> AVG 11,64 %
>
> I hope the patch is OK for FFMPEG developers. Any comments or suggestion
> to improve the patch are welcome.
>
> Mauricio Alvarez
> Department of Computer Architecture
> Universitat Polit?cnica de Catalunya
> Barcelona-Spain.
>
> [1] Yen-Kuang Chen, Eric Q. Li, Xiaosong Zhou?, and Steven Ge.
> Implementation of H.264 Encoder and Decoder on Personal Computers.
> Journal of Visual Communication and Image Representation, July 2005.
>
> [2] Mpeg test sequences at HD resolution
> http://www.ldv.ei.tum.de/liquid.php?page=70
>
[...]
> + } break;
> + }
> +
> + vector unsigned char vdst_mask = vec_lvsl(0, dst);
mixing declarations and statements, romain is this an issue for ppc-asm or do
all compilers which support ppc-asm support this too?
[...]
the patch is also full of tabs and trailing whitespace, whoever applies it
will have to run this through clean_diff ...
the mixed indention style is ugly too but the files are already in this
messed up mix so its ok, fixin the indention of the whole ppc/* should be
separate if we do it ...
[...]
> + if ( (unsigned long)dst & 0xF){ /* unaligned access to dst for add */
> +
> + switch ((unsigned long)dst % 16){
hmm why not &0xF in both?
[...]
> @@ -264,3 +816,5 @@ void dsputil_h264_init_ppc(DSPContext* c
> // ... pending ...
> }
> }
> +
> +
hmm
[...]
> + signed int ABCD[4] __attribute__((aligned(16)));
please use the new DECLARE_ALIGNED macros
[...]
romain please review and test, you are the ppc maintainer
--
Michael
More information about the ffmpeg-devel
mailing list