[FFmpeg-devel] [PATCH] DNxHD/VC-3 encoder

Mon Oct 8 13:28:37 CEST 2007

Hi Michael,

Michael Niedermayer wrote:
> Hi
> 
> On Sat, Oct 06, 2007 at 03:15:55PM +0200, Baptiste Coudurier wrote:
>> Michael Niedermayer wrote:
>>> Hi
>>>
>>> On Sat, Oct 06, 2007 at 02:10:52AM +0200, Baptiste Coudurier wrote:
>>>
>>>> Michael Niedermayer wrote:
>>>>
>>>>>>> [...]
>>>>>>>
>>>>>>>
>>>>>>>> +    ctx->m.avctx = avctx;
>>>>>>>> +    ctx->m.mb_intra = 1;
>>>>>>>> +    ctx->m.h263_aic = 1;
>>>>>>>> +
>>>>>>>> +    MPV_common_init_mmx(&ctx->m);
>>>>>>>> +    if (!ctx->m.dct_quantize)
>>>>>>>> +        ctx->m.dct_quantize = dct_quantize_c;
>>>>>>> iam an idiot, i was staring at this for maybe 2 minutes, somehow i knew
>>>>>>> it was wrong i dont understand why i didnt realize it immedeatly ...
>>>>>>>
>>>>>>> _mmx()
>>>>>>>
>>>>>>> you dont even know if the CPU is a x86 ...
>>>>>> Added ifdef
>>>>>
>>>>> NO!, a *codec *muxer has no business calling any *_mmx
>>>>> think about it, what will happen with people on ppc, on arm, on sparc, ...
>>>>> also grep, no other codec does that
>>>> Hummm, in DCT_common_init in mpegvideo.c:
>>>> #if defined(HAVE_MMX)
>>>>    MPV_common_init_mmx(s);
>>>> #elif defined(ARCH_ALPHA)
>>>>    MPV_common_init_axp(s);
>>>> #elif defined(HAVE_MLIB)
>>>>    MPV_common_init_mlib(s);
>>>> #elif defined(HAVE_MMI)
>>>>    MPV_common_init_mmi(s);
>>>> #elif defined(ARCH_ARMV4L)
>>>>    MPV_common_init_armv4l(s);
>>>> #elif defined(HAVE_ALTIVEC)
>>>>    MPV_common_init_altivec(s);
>>>> #elif defined(ARCH_BFIN)
>>>>    MPV_common_init_bfin(s);
>>>> #endif
>>>>
>>>>
>>>>> find out which init function is the correct one and call it dont call
>>>>> architecture specific functions directly
>>>>> if theres no way to do what you want currently fix the code so it is
>>>>> possible cleanly
>>>> Yes, but it seems Im doing exactly what is done in mpegvideo.c, no ?
>>>
>>> that code is not specific to a single codec, or do you see it in msmpeg4.c
>>> in h263.c, ... ?
>>> also you copied only the _mmx (and if you would copy all it would be code
>>> duplication)
>>>
>> Yes I saw that after, using DCT_common_init now.
>>
>>>>>>> [...]
>>>>>>>
>>>>>>>
>>>>>>>> +static int dnxhd_encode_picture(AVCodecContext *avctx, unsigned char *buf, int buf_size, void *data)
>>>>>>>> +{
>>>>>>>> +    DNXHDEncContext *ctx = avctx->priv_data;
>>>>>>>> +    AVFrame *input_frame = data;
>>>>>>>> +    int first_field = 1;
>>>>>>>> +    int offset, i, ret;
>>>>>>>> +
>>>>>>>> +    if (buf_size < ctx->cid_table->frame_size) {
>>>>>>>> +        av_log(avctx, AV_LOG_ERROR, "output buffer is too small to compress picture\n");
>>>>>>>> +        return -1;
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    av_picture_copy((AVPicture *)ctx->picture, data, PIX_FMT_YUV422P, avctx->width, avctx->height);
>>>>>>> why do you copy the input image?
>>>>>> If picture is not allocated as 1920x1088 I need to copy it.
>>>>>
>>>>> why?
>>>> It segv in dsp->get_pixels, since last row height is not 16 but 8 (for
>>>> example when encoding raw yuv422p stream).
>>>>
>>>> I tried reading last row differently but:
>>>> - it's not faster than copying and IMHO clutter the code.
>>>> - for progressive its possible to skip 4 bottom blocks (8 lines) but
>>>> interlaced mode is even more complicated since 4 lines are in field 1
>>>> and 4 are in field 2.
>>>
>>> so there are X lines at th bottom and you read more than X and that segfaults
>>> and as "solution" you copy the whole 1920x1080 image
>>> i really do think you should not read lines after the image
>>>
>> I changed get_blocks to read only what's needed though it's not faster,
>> please don't say dnxhd_get_pixels_4x8 is duplicated...
>>
>> I'll submit patches to rename functions.
> 
> [...]
>> +static int dnxhd_init_qmat(DNXHDEncContext *ctx, int lbias, int cbias)
>> +{
>> +    // init first elem to 1 to avoid div by 0 in convert_matrix
>> +    uint16_t weight_matrix[64] = {1,}; // convert_matrix needs uint16_t*
>> +    int qscale, i;
>> +
>> +    CHECKED_ALLOCZ(ctx->qmatrix_l,   (ctx->m.avctx->qmax+1) * 64 * sizeof(int));
>> +    CHECKED_ALLOCZ(ctx->qmatrix_c,   (ctx->m.avctx->qmax+1) * 64 * sizeof(int));
>> +    CHECKED_ALLOCZ(ctx->qmatrix_l16, (ctx->m.avctx->qmax+1) * 64 * 2 * sizeof(uint16_t));
>> +    CHECKED_ALLOCZ(ctx->qmatrix_c16, (ctx->m.avctx->qmax+1) * 64 * 2 * sizeof(uint16_t));
>> +
>> +    for (i = 1; i < 64; i++) {
>> +        int j = ctx->m.dsp.idct_permutation[ff_zigzag_direct[i]];
>> +        weight_matrix[j] = ctx->cid_table->luma_weigth[i];
>> +    }
>> +    ff_convert_matrix(&ctx->m.dsp, ctx->qmatrix_l, ctx->qmatrix_l16, weight_matrix,
>> +                      ctx->m.intra_quant_bias, 1, ctx->m.avctx->qmax, 1);
>> +    for (i = 1; i < 64; i++) {
>> +        int j = ctx->m.dsp.idct_permutation[ff_zigzag_direct[i]];
>> +        weight_matrix[j] = ctx->cid_table->chroma_weigth[i];
>> +    }
> 
> maybe it would be simpler to store these tables permutated in dnxhddata.c
> though that would need the decoder to be changed as well and is something
> which could be done later, after this is applied

Ok, I'll look at it.

> also it seems the i > 63 check in dnxhd_decode_dct_block() is done after i
> is used ...

I'll fix.

> [...]
>> +static av_always_inline void dnxhd_get_pixels_4x8(DCTELEM *restrict block, const uint8_t *pixels, int line_size)
>> +{
>> +    int i;
>> +    for (i = 0; i < 4; i++) {
>> +        block[0] = pixels[0];
>> +        block[1] = pixels[1];
>> +        block[2] = pixels[2];
>> +        block[3] = pixels[3];
>> +        block[4] = pixels[4];
>> +        block[5] = pixels[5];
>> +        block[6] = pixels[6];
>> +        block[7] = pixels[7];
>> +        pixels += line_size;
>> +        block += 8;
>> +    }
>> +    for (i = 0; i < 32; i++)
>> +        block[i] = 128; // 128 seems better than 0, +0.07 psnr
> 
> if this is what i think it is (a 8x8 block which is then encoded per 8x8dct)
> id suggest that you try reflection that is
> 
> memcpy(block   , block- 8, sizeof(*block)*8);
> memcpy(block+ 8, block-16, sizeof(*block)*8);
> memcpy(block+16, block-24, sizeof(*block)*8);
> memcpy(block+24, block-32, sizeof(*block)*8);

Yeah, better, changed.

> and patch ok

Wee, thanks a lot for your time Michael.

Now applied.

-- 
Baptiste COUDURIER                              GnuPG Key Id: 0x5C1ABAAA
SMARTJOG S.A.                                    http://www.smartjog.com
Key fingerprint                 8D77134D20CC9220201FC5DB0AC9325C5C1ABAAA
Phone: +33 1 49966312