[FFmpeg-devel] [PATCH] Mimic decoder
Ramiro Polla
ramiro
Sun Mar 16 03:48:00 CET 2008
Ramiro Polla wrote:
> Hello,
>
> [...]
>>>>> /*
>>>>> * vlc_decode_block
>>>>> *
>>>>> * De-serialize (reconstruct) a variable length coded 8x8 block.
>>>>> */
>>>>> static int vlc_decode_block(MimicContext *ctx, DCTELEM *block, int
>>>>> num_coeffs)
>>>>> {
>>>>> unsigned int pos;
>>>>>
>>>>> memset(block, 0, 64 * sizeof(DCTELEM));
>>>> clear_blocks() in dsp
>>> clear_blocks() clears 6*64*sizeof(DCTELEM). On mimic it's only
>>> 64*sizeof(DCTELEM). Is there another function that will clear only
>>> what I need?
>>
>> Hmm, not that i remember, anyway you can move the memset to after the
>> idct
>> that way it will be in the cache when its cleared.
>
> Unless I misunderstood you, the blocks are on the stack, so there's no
> reason to clear them after the IDCT. If it's ok I'll leave that memset
> there.
>
> [...]
>>>>> value = get_bits(&ctx->gb, num_bits);
>>>>>
>>>>> block[ctx->scantable.permutated[pos]] =
>>>>> ctx->vlcdec_lookup[num_bits][value];
>>>> you are missing a check for pos being < 64 here
>>> Ok. It slowed down a bit though.
>>> MSN Messenger sets coefs to 28. The max ammount of pos+= is 14, so
>>> pos will never be higher than 41 for all MSN Messenger created
>>> videos. Is there something that can be done to make the check know it
>>> will most likely always be < 64? (besides profiling)
>>
>> If the check really annoys you that much then theres a way to get rid
>> of it.
>> Simply make block 256 entries large, so the uint8_t scantable cannot
>> cause
>> out of array writes. But this must be carefully documented so noone
>> simplifies it away thinking the size being unneeded.
>>
>> or, make the End of block code vlc=256 instead of =0
>> and use
>>
>> pos += vlc & (256+15);
>>
>> if(pos >= 64)
>> return vlc==256;
>>
>> (yes this also avoids the -1 check)
>
> 4.7% slower.
>
>>>> and
>>>> value= 1 - (1<<num_bits);
>>>> if(num_bits>1)
>>>> value+= get_bits(num_bits-1);
>>>> if(get_bits1())
>>>> value -= value;
>>>> block[ctx->scantable.permutated[pos]] = value;
>>>> Or if you think the table matters speedwise then make it static dont
>>>> duplicate it in every decoder instance.
>>> ref calc static
>>> mimic.o : 51868 51304 52192
>>> avg time: 1.692 1.712 1.728
>>>
>>> That's very odd... Shouldn't the static table be at least as fast as the
>>> current code?
>>
>> Yes
>> If you want to debug it the asm output from gcc would be a start but i
>> doubt its worth the time spend.
>
> With a bigger sample, I got
> ref calc static
> avg time: 6.688 6.932 6.744
>
>>> Attached are the 2 patches I used to test. (lut_calc.diff and
>>> lut_static.diff). Anyways, most FFmpeg stuff introduced a
>>> considerable speedup over the original code, so I'm tempted to apply
>>> one of these two.
>>
>>> Could I use the hardcoded tables ifdef there?
>>
>> I dont see much sense in it. I mean the calc code doesnt need the
>> table at
>> all and needs around 100 byte while the table needs 1024.
>
> If it's ok I'd prefer to keep the current code with vlcdec_lookup in
> MimicContext.
>
>> [...]
>>> if(vlc == -1)
>>> return 0;
>>> if(!vlc) /* end-of-block code */
>>> return 1;
>>
>> maybe
>> if(vlc<=0)
>> return !vlc;
>>
>> would be faster (no iam not suggesting this if its not faster)
>
> It's also slower (1.7%), so I'll just leave it as is.
>
> [...]
>>> static int decode(MimicContext *ctx, int quality, int num_coeffs,
>>> int is_pframe)
> [...]
>>> for(plane = 0; plane < 3; plane++) {
> [...]
>>> for(y = 0 ; y < ctx->num_vblocks[plane] ; y++) {
> [...]
>>> for(x = 0; x < ctx->num_hblocks[plane]; x++) {
> [...]
>>> if(!is_pframe || get_bits1(&ctx->gb) == is_chroma) {
> [...]
>>> if(is_chroma || !is_pframe ||
>>> !get_bits1(&ctx->gb)) {
> [...]
>>> } else {
>>> unsigned int backref;
>>> AVPicture back_frame;
>>> /* Yes: read the backreference (4 bits) and
>>> copy. */
>>> backref = get_bits(&ctx->gb, 4);
>>
>>> prepare_avpic(ctx, &back_frame, (AVPicture*)
>>> &ctx->buf_ptrs[(ctx->ptr_index +
>>> backref) & 15]);
>>
>> It would be faster if this wouldnt be done per block
>
> Now the backreference frames are also kept flipped in MimicContext.
>
>> [...]
>>> /*
>>> * initialize_vlcdec_lookup
>>> *
>>> * Internal helper-function used to initialize
>>> * the lookup-table used by the VLC-decoder.
>>> */
>>
>> not doxygen compatible
>
> Fixed.
>
>> [...]
>>
>>> if(!(is_pframe && !ctx->prev_frame.data[0])) {
>>> int swap_buf_size = buf_size - MIMIC_HEADER_SIZE;
>>> ctx->swap_buf = av_fast_realloc(ctx->swap_buf,
>>> &ctx->swap_buf_size,
>>> swap_buf_size +
>>> FF_INPUT_BUFFER_PADDING_SIZE);
>>> if(!ctx->swap_buf)
>>> return AVERROR_NOMEM;
>>>
>>> ctx->dsp.bswap_buf((uint32_t*)ctx->swap_buf,
>>> (const uint32_t*) (buf + MIMIC_HEADER_SIZE),
>>> swap_buf_size>>2);
>>> init_get_bits(&ctx->gb, ctx->swap_buf, swap_buf_size << 3);
>>>
>>> if(!decode(ctx, quality, num_coeffs, is_pframe))
>>> goto fail;
>>> } else {
>>> av_log(avctx, AV_LOG_ERROR, "decoding must start with
>>> keyframe\n");
>>> goto fail;
>>> }
>>
>> Maybe get_buffer() could be called for the previous frame if its missing?
>> Not sure if this makes much sense, that is if frames can be lost ...
>
> This only fails if the stream doesn't start decoding at the beginning,
> so there are still no backreferences.
New file has a few changes to reflect the removal of extradata from the
demuxer.
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: mimic.c
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080315/6a4cc748/attachment.txt>
More information about the ffmpeg-devel
mailing list