[FFmpeg-devel] [PATCH] h264 luma interpolation 8x8 for altivec

Mon Jun 18 19:32:20 CEST 2007

Mauricio Alvarez wrote:
> Hi All,
> 
> Here I'm sending a patch that adds support for luma interpolation of 8x8
> blocks using Altivec. I have tested it on a G5 machine (Linux
> 2.6.15-1.2054_FC5, gcc 4.1.1)  using  some videos that I use for h264
> research [1]). The resulting video files are md5 identical to the generated
> by the original ffmpeg.

First, thank you for your effort.

> I made an analysis of this alignment and found that the destination result
> is always aligned, based on that it is possible to remove the re-alignment
> code at each store. I send a separate patch for this. Also It have been
> tested and the md5 check passed OK.

Good

> Additionally I'm working on Altivec functions for doing the luma
> interpolation for non-square blocks: 16x8, 8x16,  8x4 and 4x8. The
> implementation of the functions is very easy. My question is how to
> integrate them with DSPContext structure. An option could be to add a
> position to the XXX_pixels_tab[][] structure, like this
> index | size
> 0: 16x16
> 1: 8x8
> 2: 4x4
> 3: 16x8
> 4: 8x16
> 5: 8x4
> 6: 4x8

I'd like to know the opinion of the other people involved (x86 hackers
I'm speacking to you ^^)

> Suggestions on this issue are welcome.

Hm I'd like to comment your patches inlined but looks like my more than
often idiotic client doesn't understand that text/* should be put
inline... (hi thunderbird)

first, the patch is about 700 lines, a bit big, so I'll be slow
commenting, maybe you should try to split it in pieces.

> +static void PREFIX_h264_qpel8_h_lowpass_altivec(uint8_t * dst,
> uint8_t * src, int dstStride, int srcStride) {
> +  POWERPC_PERF_DECLARE(PREFIX_h264_qpel8_h_lowpass_num, 1);

DO you really use this? I'm actively deprecating it since last year and
probably I'll remove it anytime soon if nobody screams, I think dtrace
on macosx and oprofile on linux cover all our performance counting needs

> \
> static void OPNAME ## h264_qpel ## SIZE ## _mc10_ ## CODETYPE(uint8_t
>*dst, uint8_t *src, int stride){ \
>-    DECLARE_ALIGNED_16(uint8_t, half[SIZE*SIZE]);\
>-    put_h264_qpel ## SIZE ## _h_lowpass_ ## CODETYPE(half, src, SIZE,
>stride);\
>+    DECLARE_ALIGNED_16(uint8_t, half[16*16]);\
>+    put_h264_qpel ## SIZE ## _h_lowpass_ ## CODETYPE(half, src, 16,
>stride);\
>     OPNAME ## pixels ## SIZE ## _l2_ ## CODETYPE(dst, src, half,
>stride, stride, SIZE);\

doesn't look right

>-  if ( (unsigned long) dst & 0x0f) {
...
>+  if (((unsigned long)dst) % 16 == 0) {

hm..

I guess that's all for now...

-- 

Luca Barbato

Gentoo/linux Gentoo/PPC
http://dev.gentoo.org/~lu_zero