[FFmpeg-devel] [PATCH] VC1: merge put/add_pixels with IDCT8x8.

Ronald S. Bultje rsbultje
Sat Feb 19 18:27:16 CET 2011


Just quick note here:

On Sat, Feb 19, 2011 at 12:20 PM, Ronald S. Bultje <rsbultje at gmail.com> wrote:
> -static void vc1_put_block(VC1Context *v, DCTELEM block[6][64])
[..]
> - ? ?if(v->rangeredfrm) {
> - ? ? ? ?int i, j, k;
> - ? ? ? ?for(k = 0; k < 6; k++)
> - ? ? ? ? ? ?for(j = 0; j < 8; j++)
> - ? ? ? ? ? ? ? ?for(i = 0; i < 8; i++)
> - ? ? ? ? ? ? ? ? ? ?block[k][i + j*8] = (block[k][i + j*8] - 64) << 1;
> -
> - ? ?}
[..]
> ?static void vc1_decode_i_blocks(VC1Context *v)
[..]
> + ? ?idct8x8_fn = v->vc1dsp.vc1_inv_trans_8x8_put[(!!v->rangeredfrm) | ((v->pq < 9 || !v->overlap) << 1)];
[..]
[..]
> - ? ? ? ? ? ? ? ?v->vc1dsp.vc1_inv_trans_8x8(s->block[k]);
> - ? ? ? ? ? ? ? ?if(v->pq >= 9 && v->overlap) {
> - ? ? ? ? ? ? ? ? ? ?for(j = 0; j < 64; j++) s->block[k][j] += 128;
> - ? ? ? ? ? ? ? ?}
> + ? ? ? ? ? ? ? ?if(k > 3 && (v->s.flags & CODEC_FLAG_GRAY)) continue;
> + ? ? ? ? ? ? ? ?idct8x8_fn(dst[k],
> + ? ? ? ? ? ? ? ? ? ? ? ? ? k & 4 ? s->uvlinesize : s->linesize,
> + ? ? ? ? ? ? ? ? ? ? ? ? ? s->block[k]);
> ? ? ? ? ? ? }
>
> - ? ? ? ? ? ?vc1_put_block(v, s->block);

I know this isn't the same thing. The old code does this:

read coeffs
do idct
if (v->pq >= 9 && v->overlap) add 128 to each of them
if (rangered) sub 64 from each of them, <<= 1
put_pixels_clamped();

the new code does this, essentially:
read coeffs
do idct
if (v->pq >= 9 && v->overlap) add 128 to each of them
if (rangered) { if (v->pq >= 9 && v->overlap) { sub 64 from each} <<= 1 }
put_pixels_clamped();

and then the add 128 is done through using
put_signed_pixels_clamped(), which does that internally, instead of
adding it manually and using put_pixels_clamped(). I'm almost certain
this is a bug in the old code, or maybe a bug in the spec, or
something along those lines, but Kostya, could you please confirm this
change makes sense when you review this?

Also note that this change likely slows the code down in some way,
because we're temporarily not using
put_{,signed_}pixels_clamped_mmx(). However, this work allows
integrating several loops together which will make the final work,
after MMX/SSE optimizations, a lot faster than without this. Thanks to
Jason for the suggestions.

Ronald



More information about the ffmpeg-devel mailing list