[FFmpeg-devel] FW: [PATCH] 10-bit DNxHD decoding and encoding

Wed Mar 30 12:11:40 CEST 2011

Baptiste, could you give feedback on my previous message in this thread (reproduced below)?
Others are welcome to contribute to this discussion as well.

Joseph Artsimovich
Software Developer
MirriAd Ltd

-----Original Message-----
From: ffmpeg-devel-bounces at ffmpeg.org [mailto:ffmpeg-devel-bounces at ffmpeg.org] On Behalf Of Joseph Artsimovich
Sent: Thursday, March 24, 2011 9:51 AM
To: FFmpeg development discussions and patches
Subject: Re: [FFmpeg-devel] [PATCH] 10-bit DNxHD decoding and encoding

> > -static const uint16_t dnxhd_1238_run_codes[62] = {
> > +static const uint16_t dnxhd_1235_1238_1241_run_codes[62] = {
> >      0, 4, 10, 11, 24, 25, 26, 27, 56, 57, 58, 59, 120, 242, 486, 
> > 487, 488, 489, 980, 981, 982, 983, 984, 985, 986, 987, 988, 989, 
> > 990, 991, 992, 993, 994, 995, 996, 997, 998, 999, 1000, 1001, 1002, 
> > 1003, 1004, 1005, 1006, 1007, 1008, 1009, 1010, 1011, 1012, 1013, 
> > 1014, 1015, 1016, 1017, 1018, 1019, 1020, 1021, 1022, 1023,
> 
> Please rename in a separate patch.
Done.  The patch is attached.

> > @@ -40,6 +43,7 @@ typedef struct {
> >      VLC ac_vlc, dc_vlc, run_vlc;
> >      int last_dc[3];
> >      DSPContext dsp;
> > +    int dsp_bits; // 8, 10 or 0 if not initialized.
> 
> bit_depth is a better name IMHO.
Well, this field was meant to indicate how the DSPContext was initialized (if at all).
For other purposes we have cid_table->bit_depth

> > +static av_always_inline void dnxhd_10bit_get_pixels_8x8(DCTELEM
> > +*restrict block, const uint16_t *pixels, int line_size) {
> > +    for (int i = 0; i < 8; ++i) {
> > +        for (int j = 0; j < 8; ++j) {
> > +            block[j] = (DCTELEM)floor((float)pixels[j] * (1023.0f /
> > +65535.0f) + 0.5f);
> 
> Btw, why are you using floats here ?
Hoping a compiler might vectorize this code.

> > -        dnxhd_get_blocks(ctx, mb_x, mb_y);
> > +        if (ctx->cid_table->bit_depth == 10) {
> > +            dnxhd_10bit_get_blocks(ctx, mb_x, mb_y);
> > +        } else {
> > +            dnxhd_get_blocks(ctx, mb_x, mb_y);
> > +        }
> 
> I think a function pointer in context would be better.
OK.

> >          for (i = 0; i < 8; i++) {
> >              DCTELEM *src_block = ctx->blocks[i]; @@ -439,6 +549,8 
> > @@ static int dnxhd_calc_bits_thread(AVCodecContext *avctx, void 
> > *arg, int
> jobnr, i
> >              diff = block[0] - ctx->m.last_dc[n];
> >              if (diff < 0) nbits = av_log2_16bit(-2*diff);
> >              else          nbits = av_log2_16bit( 2*diff);
> > +
> > +            assert(nbits < ctx->cid_table->bit_depth + 4);
> 
> assert would kill the program.
That would indicate a serious bug.  The assert merely checks we are not reading past the end of a table.
BTW, I've come by the following comment in mpegvideo_enc.c:
//non c quantize code returns incorrect block_last_index FIXME If still true, that could trigger this assert.

> > +// To get DCT coefficients multiplied by 4.
> 
> Shouldn't we avoid returning scaled output, could be usefull for 11,12 
> bits, etc..
Well, I guess it's done to get more precise quantization.  I would be fine with having an unscaled version though.  After all, the range of DCT coefficients is already 3 bits larger than the range of input samples.

> > [...]
> >
> > diff --git a/libavcodec/mpegvideo.h b/libavcodec/mpegvideo.h index 
> > 3e447ab..eb2352f 100644
> > --- a/libavcodec/mpegvideo.h
> > +++ b/libavcodec/mpegvideo.h
> > @@ -49,7 +49,19 @@ enum OutputFormat {  #define MPEG_BUF_SIZE (16
> *
> > 1024)
> >
> >  #define QMAT_SHIFT_MMX 16
> > -#define QMAT_SHIFT 22
> > +
> > +/*
> > + * QMAT_SHIFT should be as high as possible, but not so high to 
> > +cause
> > + * overflow in the following expression:
> > + * scaled_dct[i] * ((p / s) * ((1 << QMAT_SHIFT) / (qscale *
> > +weight_table[i]))))
> > + * where s is a scale factor applied to DCT coefficients while
> > + * p and weight_table are codec-specific.
> > + * For example, the worst case for 10-bit DNxHD would be:
> > + * p == 32, s == 4, qscale == 1, weight_table[i] == 32, 
> > +scaled_dct[i] == (1 << 13) * 4
> > + * which gives (1 << 15) * (((1 << QMAT_SHIFT) >> 5) << 3)
> > + * which gives 18 as the largest safe QMAT_SHIFT.
> > + */
> > +#define QMAT_SHIFT 18
> >
> >  #define MAX_FCODE 7
> >  #define MAX_MV 2048
> > diff --git a/libavcodec/mpegvideo_enc.c b/libavcodec/mpegvideo_enc.c 
> > index 9255fa8..3f46033 100644
> > --- a/libavcodec/mpegvideo_enc.c
> > +++ b/libavcodec/mpegvideo_enc.c
> > @@ -3725,9 +3725,14 @@ int dct_quantize_c(MpegEncContext *s,
> >              else
> >                  q = s->c_dc_scale;
> >              q = q << 3;
> > -        } else
> > +        } else {
> >              /* For AIC we skip quant/dequant of INTRADC */
> > -            q = 1 << 3;
> > +            if (s->dsp.fdct == ff_faandct_10bit_safe) {
> > +                q = 1 << 2;
> > +            } else {
> > +                q = 1 << 3; // The post-scale applied to a DCT output.
> > +            }
> > +        }
> >
> >          /* note: block[0] is assumed to be positive */
> >          block[0] = (block[0] + (q >> 1)) / q;
> 
> dct_quantize_mmx/sse2/etc .. is used if present, see QMAX_SHIFT_MMX.
I specifically request FF_DCT_10BIT, so I won't get any of those.

> I think it's time to separate the dnxhd quantization fonction from the 
> mpeg one and change QMAT_SHIFT only there.
> Remove MpegEncContext from DNxHDEncContext, add dnxhd_template_mmx.c 
> in
> x86 and clean it up (intra only, out format, aic only etc..)
I believe we only use dct_quantize() from MpegEncContext, and there exists quite a few assembly versions of that.  I don't think it's a good idea to duplicate that code.

Joseph Artsimovich
Software Developer
MirriAd Ltd

-------------- next part --------------
A non-text attachment was scrubbed...
Name: dnxhd_rename_tables.patch
Type: application/octet-stream
Size: 10315 bytes
Desc: dnxhd_rename_tables.patch
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20110330/26c40d09/attachment.obj>