[FFmpeg-devel] [PATCH] DNxHD/VC-3 encoder
Michael Niedermayer
michaelni
Mon Oct 1 16:03:11 CEST 2007
On Mon, Sep 24, 2007 at 01:00:12PM +0200, Baptiste Coudurier wrote:
> Hi Michael,
>
> Michael Niedermayer wrote:
> > Hi
> >
> > On Fri, Sep 07, 2007 at 08:05:28PM +0200, Michael Niedermayer wrote:
> >> Hi
> >>
> >> On Fri, Sep 07, 2007 at 09:17:49AM -0600, Loren Merritt wrote:
> >>> On Fri, 7 Sep 2007, Michael Niedermayer wrote:
> >>>
> >>>> when people start using qsort you know theres something wrong in their
> >>>> encoder (or at least not optimal)
> >>>>
> >>>> so, lets first check if i understand the constraints
> >>>> * its possible to change qscale per macroblock and it doesnt cost anything
> >>>> * theres a fixed number of bits available for each frame (pre mpeg1 design)
> >>>>
> >>>> in that case the optimal solution (though not fastest) is to do a
> >>>> binary search over lambda (using the number of bits and the ones available
> >>>> to find the best lambda which results in a frame which is not too large)
> >>>>
> >>>> then for each such lambda, find the best qscale for each MB, again by
> >>>> binary search minimizing SSD + bits*lambda (that way we know the best
> >>>> qscales and how many bits the frame would need)
> >>> Your algorithm produces the same decisions as the one with qsort. I don't
> >>> know which is faster.
> >>>
> >>> Consider the list of possible block encodings sorted by
> >>> delta_ssd/delta_bits. Optimizing for qps at any given lambda will produce
> >>> some prefix of that list. i.e. for any lambda, there is some
> >>> position in the list such that all preceding entries have ssd/bits
> >>> better than lambda, and all following entries have ssd/bits worse
> >>> than lambda. So instead of evaluating bits(lambda) as a black box and
> >>> searching for a specified values of bits, construct the function
> >>> lambda(bits) and evaluate it at the specified value of bits.
> >> yes, but as far as i can see thats not what the current code does, it just
> >> tries qscale and qscale+1 if it would try all qscale then yes they would
> >> be identical
> >
> > just to clarify my original review ...
> > i dont mind if the code uses qsort, what i was complaining about was that
> > it isnt finding the optimal (in the RD sense) solution
> >
> > if its changed to find the optimal one or its demonstrated that the
> > PSNR/bitrate difference is negligible then iam happy with that part
> >
>
> I tried lagrange. I keep the qsort rc because it is useable (> real time
> on quad core cpu with variance, -5 fps for ssd), rdo one is encoding at
> 4 fps :/
>
> Im wondering if I did screwed up the implementation because I have an
> IMHO negligeable gain with rdo.
>
> I reused existing quant functions.
>
> Threaded functions must return int else you get a warning.
>
> About encode_dc, it will be complicated to reuse existing ones because:
> mpeg12 use some static tables and generation code.
> mjpeg use uint16_t for dc_codes.
>
> About mb_var_thread, existing function use current_picture and set
> variables I don't need, mb_mean, and use me field too ...
>
> Attached are some psnr results with one HD uncompressed sample I have.
>
> Any other optimization because what Loren suggested earlier ?
[...]
> +static int dnxhd_init_qmat(DNXHDEncContext *ctx, int lbias, int cbias)
> +{
> + int qscale, i;
> + int (*qmat_luma )[64] = av_mallocz(ctx->avctx->qmax * 64 * sizeof(*qmat_luma));
> + int (*qmat_chroma)[64] = av_mallocz(ctx->avctx->qmax * 64 * sizeof(*qmat_chroma));
> + uint16_t (*qmat16_luma )[2][64] = av_mallocz(ctx->avctx->qmax * 64 * 2 * sizeof(*qmat16_luma));
> + uint16_t (*qmat16_chroma)[2][64] = av_mallocz(ctx->avctx->qmax * 64 * 2 * sizeof(*qmat16_chroma));
> +
> + for (qscale = 1; qscale < ctx->avctx->qmax; qscale++) {
> + for (i = 1; i < 64; i++) {
> + int j = ff_zigzag_direct[i];
> + qmat_luma [qscale][j] = (int)((UINT64_C(4) << QMAT_SHIFT) / (qscale * ctx->cid_table-> luma_weigth[i]));
> + qmat_chroma[qscale][j] = (int)((UINT64_C(4) << QMAT_SHIFT) / (qscale * ctx->cid_table->chroma_weigth[i]));
> +
> + qmat16_luma [qscale][0][j] = (4 << QMAT_SHIFT_MMX) / (qscale * ctx->cid_table-> luma_weigth[i]);
> + qmat16_chroma[qscale][0][j] = (4 << QMAT_SHIFT_MMX) / (qscale * ctx->cid_table->chroma_weigth[i]);
> +
> + if (qmat16_luma [qscale][0][j]==0 || qmat16_luma [qscale][0][j]==128*256) qmat16_luma [qscale][0][j]=128*256-1;
> + if (qmat16_chroma[qscale][0][j]==0 || qmat16_chroma[qscale][0][j]==128*256) qmat16_chroma[qscale][0][j]=128*256-1;
> +
> + qmat16_luma [qscale][1][j] = ROUNDED_DIV(lbias<<(16-QUANT_BIAS_SHIFT), qmat16_luma [qscale][0][j]);
> + qmat16_chroma[qscale][1][j] = ROUNDED_DIV(cbias<<(16-QUANT_BIAS_SHIFT), qmat16_chroma[qscale][0][j]);
> + }
> + }
> +
> + ctx->q_scale_matrixl = qmat_luma;
> + ctx->q_scale_matrixc = qmat_chroma;
> + ctx->q_scale_matrixl16 = qmat16_luma;
> + ctx->q_scale_matrixc16 = qmat16_chroma;
> + return 0;
> +}
duplicate of convert_matrix()
[...]
> + dsputil_init(&ctx->m.dsp, avctx);
> + ff_init_scantable(ctx->m.dsp.idct_permutation, &ctx->m.intra_scantable, ff_zigzag_direct);
> +
> + ctx->m.avctx = avctx;
> + ctx->m.dct_quantize = dct_quantize_c;
does this disable the optimized quantize funcions?
what about quantize_trelis? (no its no reason to delay the patch, making
trellis work could very well be in a later patch, iam just asking)
[...]
> + if (ctx->slice_size[mb_y]&31)
> + ctx->slice_size[mb_y] += 32-(ctx->slice_size[mb_y]&31);
+31)&~31;
[...]
> +static int dnxhd_mb_var_thread(AVCodecContext *avctx, void *arg)
> +{
> + DNXHDEncContext *ctx = arg;
> + int mb_y, mb_x;
> + for (mb_y = ctx->m.start_mb_y; mb_y < ctx->m.end_mb_y; mb_y++) {
> + for (mb_x = 0; mb_x < ctx->m.mb_width; mb_x++) {
> + unsigned mb = mb_y * ctx->m.mb_width + mb_x;
> + uint8_t *pix = ctx->thread[0]->src_y + ((mb_y<<4) * ctx->m.linesize) + (mb_x<<4);
> + int sum = ctx->m.dsp.pix_sum(pix, ctx->m.linesize);
> + int varc = (ctx->m.dsp.pix_norm1(pix, ctx->m.linesize) - (((unsigned)(sum*sum))>>8)+500+128)>>8;
could you explain what that +500 does?
[...]
> + if (bits&31) //padding
> + bits += 32-(bits&31);
+31)&~31;
[...]
> + if (ctx->interlaced && ctx->cur_field) {
> + ctx->src_y = ctx->picture->data[0] + ctx->picture->linesize[0];
> + ctx->src_u = ctx->picture->data[1] + ctx->picture->linesize[1];
> + ctx->src_v = ctx->picture->data[2] + ctx->picture->linesize[2];
> + } else {
> + ctx->src_y = ctx->picture->data[0];
> + ctx->src_u = ctx->picture->data[1];
> + ctx->src_v = ctx->picture->data[2];
> + }
id write
for(i=0; i<3; i++){
ctx->src[i]= ctx->picture->data[i];
if(ctx->interlaced && ctx->cur_field)
ctx->src[i] +=ctx->picture->linesize[i];
}
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
Democracy is the form of government in which you can choose your dictator
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20071001/f0a721a0/attachment.pgp>
More information about the ffmpeg-devel
mailing list