[FFmpeg-devel] [PATCH] DNxHD/VC-3 encoder
Baptiste Coudurier
baptiste.coudurier
Wed Oct 3 22:45:04 CEST 2007
Hi Michael, thanks for the review,
Michael Niedermayer wrote:
> On Mon, Sep 24, 2007 at 01:00:12PM +0200, Baptiste Coudurier wrote:
>
>>Hi Michael,
>>
>>Michael Niedermayer wrote:
>>
>>>Hi
>>>
>>>On Fri, Sep 07, 2007 at 08:05:28PM +0200, Michael Niedermayer wrote:
>>>
>>>>Hi
>>>>
>>>>On Fri, Sep 07, 2007 at 09:17:49AM -0600, Loren Merritt wrote:
>>>>
>>>>>On Fri, 7 Sep 2007, Michael Niedermayer wrote:
>>>>>
>>>>>
>>>>>>when people start using qsort you know theres something wrong in their
>>>>>>encoder (or at least not optimal)
>>>>>>
>>>>>>so, lets first check if i understand the constraints
>>>>>>* its possible to change qscale per macroblock and it doesnt cost anything
>>>>>>* theres a fixed number of bits available for each frame (pre mpeg1 design)
>>>>>>
>>>>>>in that case the optimal solution (though not fastest) is to do a
>>>>>>binary search over lambda (using the number of bits and the ones available
>>>>>>to find the best lambda which results in a frame which is not too large)
>>>>>>
>>>>>>then for each such lambda, find the best qscale for each MB, again by
>>>>>>binary search minimizing SSD + bits*lambda (that way we know the best
>>>>>>qscales and how many bits the frame would need)
>>>>>
>>>>>Your algorithm produces the same decisions as the one with qsort. I don't
>>>>>know which is faster.
>>>>>
>>>>>Consider the list of possible block encodings sorted by
>>>>>delta_ssd/delta_bits. Optimizing for qps at any given lambda will produce
>>>>>some prefix of that list. i.e. for any lambda, there is some
>>>>>position in the list such that all preceding entries have ssd/bits
>>>>>better than lambda, and all following entries have ssd/bits worse
>>>>>than lambda. So instead of evaluating bits(lambda) as a black box and
>>>>>searching for a specified values of bits, construct the function
>>>>>lambda(bits) and evaluate it at the specified value of bits.
>>>>
>>>>yes, but as far as i can see thats not what the current code does, it just
>>>>tries qscale and qscale+1 if it would try all qscale then yes they would
>>>>be identical
>>>
>>>just to clarify my original review ...
>>>i dont mind if the code uses qsort, what i was complaining about was that
>>>it isnt finding the optimal (in the RD sense) solution
>>>
>>>if its changed to find the optimal one or its demonstrated that the
>>>PSNR/bitrate difference is negligible then iam happy with that part
>>>
>>
>>I tried lagrange. I keep the qsort rc because it is useable (> real time
>>on quad core cpu with variance, -5 fps for ssd), rdo one is encoding at
>>4 fps :/
>>
>>Im wondering if I did screwed up the implementation because I have an
>>IMHO negligeable gain with rdo.
>>
>>I reused existing quant functions.
>>
>>Threaded functions must return int else you get a warning.
>>
>>About encode_dc, it will be complicated to reuse existing ones because:
>>mpeg12 use some static tables and generation code.
>>mjpeg use uint16_t for dc_codes.
>>
>>About mb_var_thread, existing function use current_picture and set
>>variables I don't need, mb_mean, and use me field too ...
>>
>>Attached are some psnr results with one HD uncompressed sample I have.
>>
>>Any other optimization because what Loren suggested earlier ?
>
>
> [...]
>
>>+static int dnxhd_init_qmat(DNXHDEncContext *ctx, int lbias, int cbias)
>>+{
>>+ int qscale, i;
>>+ int (*qmat_luma )[64] = av_mallocz(ctx->avctx->qmax * 64 * sizeof(*qmat_luma));
>>+ int (*qmat_chroma)[64] = av_mallocz(ctx->avctx->qmax * 64 * sizeof(*qmat_chroma));
>>+ uint16_t (*qmat16_luma )[2][64] = av_mallocz(ctx->avctx->qmax * 64 * 2 * sizeof(*qmat16_luma));
>>+ uint16_t (*qmat16_chroma)[2][64] = av_mallocz(ctx->avctx->qmax * 64 * 2 * sizeof(*qmat16_chroma));
>>+
>>+ for (qscale = 1; qscale < ctx->avctx->qmax; qscale++) {
>>+ for (i = 1; i < 64; i++) {
>>+ int j = ff_zigzag_direct[i];
>>+ qmat_luma [qscale][j] = (int)((UINT64_C(4) << QMAT_SHIFT) / (qscale * ctx->cid_table-> luma_weigth[i]));
>>+ qmat_chroma[qscale][j] = (int)((UINT64_C(4) << QMAT_SHIFT) / (qscale * ctx->cid_table->chroma_weigth[i]));
>>+
>>+ qmat16_luma [qscale][0][j] = (4 << QMAT_SHIFT_MMX) / (qscale * ctx->cid_table-> luma_weigth[i]);
>>+ qmat16_chroma[qscale][0][j] = (4 << QMAT_SHIFT_MMX) / (qscale * ctx->cid_table->chroma_weigth[i]);
>>+
>>+ if (qmat16_luma [qscale][0][j]==0 || qmat16_luma [qscale][0][j]==128*256) qmat16_luma [qscale][0][j]=128*256-1;
>>+ if (qmat16_chroma[qscale][0][j]==0 || qmat16_chroma[qscale][0][j]==128*256) qmat16_chroma[qscale][0][j]=128*256-1;
>>+
>>+ qmat16_luma [qscale][1][j] = ROUNDED_DIV(lbias<<(16-QUANT_BIAS_SHIFT), qmat16_luma [qscale][0][j]);
>>+ qmat16_chroma[qscale][1][j] = ROUNDED_DIV(cbias<<(16-QUANT_BIAS_SHIFT), qmat16_chroma[qscale][0][j]);
>>+ }
>>+ }
>>+
>>+ ctx->q_scale_matrixl = qmat_luma;
>>+ ctx->q_scale_matrixc = qmat_chroma;
>>+ ctx->q_scale_matrixl16 = qmat16_luma;
>>+ ctx->q_scale_matrixc16 = qmat16_chroma;
>>+ return 0;
>>+}
>
>
> duplicate of convert_matrix()
Yes, there are some tiny differences, and I need more code to readjust
tables, and its seems bigger than duplicating convert_matrix. What
should I do ?
> [...]
>
>>+ dsputil_init(&ctx->m.dsp, avctx);
>>+ ff_init_scantable(ctx->m.dsp.idct_permutation, &ctx->m.intra_scantable, ff_zigzag_direct);
>>+
>>+ ctx->m.avctx = avctx;
>>+ ctx->m.dct_quantize = dct_quantize_c;
>
>
> does this disable the optimized quantize funcions?
Not really, since MPV_common_init_mmx will reassign optimized function
later. I changed it nonetheless.
> what about quantize_trelis? (no its no reason to delay the patch, making
> trellis work could very well be in a later patch, iam just asking)
Well I'd like the encoder to be commited so I have real history on the
work. I find easier to work when code is in svn (diffs).
I had a quick look at trellis but I don't understand it completely yet
(ac_esc_length not set by mpeg2 encoder for example, and levels taken
into account seems different than dnxhd levels, once I get it I could as
well do it for mjpeg)
> [...]
>
>
>>+ if (ctx->slice_size[mb_y]&31)
>>+ ctx->slice_size[mb_y] += 32-(ctx->slice_size[mb_y]&31);
>
>
> +31)&~31;
>
Done.
> [...]
>
>>+static int dnxhd_mb_var_thread(AVCodecContext *avctx, void *arg)
>>+{
>>+ DNXHDEncContext *ctx = arg;
>>+ int mb_y, mb_x;
>>+ for (mb_y = ctx->m.start_mb_y; mb_y < ctx->m.end_mb_y; mb_y++) {
>>+ for (mb_x = 0; mb_x < ctx->m.mb_width; mb_x++) {
>>+ unsigned mb = mb_y * ctx->m.mb_width + mb_x;
>>+ uint8_t *pix = ctx->thread[0]->src_y + ((mb_y<<4) * ctx->m.linesize) + (mb_x<<4);
>>+ int sum = ctx->m.dsp.pix_sum(pix, ctx->m.linesize);
>>+ int varc = (ctx->m.dsp.pix_norm1(pix, ctx->m.linesize) - (((unsigned)(sum*sum))>>8)+500+128)>>8;
>
>
> could you explain what that +500 does?
I think not, I assumed it was some black magic, I tried without
and found no differences. I removed it then.
> [...]
>
>>+ if (bits&31) //padding
>>+ bits += 32-(bits&31);
>
>
> +31)&~31;
>
Done. Another one was changed too.
> [...]
>
>>+ if (ctx->interlaced && ctx->cur_field) {
>>+ ctx->src_y = ctx->picture->data[0] + ctx->picture->linesize[0];
>>+ ctx->src_u = ctx->picture->data[1] + ctx->picture->linesize[1];
>>+ ctx->src_v = ctx->picture->data[2] + ctx->picture->linesize[2];
>>+ } else {
>>+ ctx->src_y = ctx->picture->data[0];
>>+ ctx->src_u = ctx->picture->data[1];
>>+ ctx->src_v = ctx->picture->data[2];
>>+ }
>
>
> id write
>
> for(i=0; i<3; i++){
> ctx->src[i]= ctx->picture->data[i];
> if(ctx->interlaced && ctx->cur_field)
> ctx->src[i] +=ctx->picture->linesize[i];
> }
>
Ok, changed.
How does it look like ?
--
Baptiste COUDURIER GnuPG Key Id: 0x5C1ABAAA
SMARTJOG S.A. http://www.smartjog.com
Key fingerprint 8D77134D20CC9220201FC5DB0AC9325C5C1ABAAA
Phone: +33 1 49966312
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: dnxhd_encoder6.patch
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20071003/7167764c/attachment.asc>
More information about the ffmpeg-devel
mailing list