[FFmpeg-devel] [PATCH] Mimic decoder

Ramiro Polla ramiro
Sun Mar 16 03:48:00 CET 2008


Ramiro Polla wrote:
> Hello,
> 
> [...]
>>>>> /*
>>>>>  * vlc_decode_block
>>>>>  *
>>>>>  * De-serialize (reconstruct) a variable length coded 8x8 block.
>>>>>  */
>>>>> static int vlc_decode_block(MimicContext *ctx, DCTELEM *block, int 
>>>>> num_coeffs)
>>>>> {
>>>>>     unsigned int pos;
>>>>>
>>>>>     memset(block, 0, 64 * sizeof(DCTELEM));
>>>> clear_blocks() in dsp
>>> clear_blocks() clears 6*64*sizeof(DCTELEM). On mimic it's only 
>>> 64*sizeof(DCTELEM). Is there another function that will clear only 
>>> what I need?
>>
>> Hmm, not that i remember, anyway you can move the memset to after the 
>> idct
>> that way it will be in the cache when its cleared.
> 
> Unless I misunderstood you, the blocks are on the stack, so there's no 
> reason to clear them after the IDCT. If it's ok I'll leave that memset 
> there.
> 
> [...]
>>>>>         value = get_bits(&ctx->gb, num_bits);
>>>>>
>>>>>         block[ctx->scantable.permutated[pos]] = 
>>>>> ctx->vlcdec_lookup[num_bits][value];
>>>> you are missing a check for pos being < 64 here
>>> Ok. It slowed down a bit though.
>>> MSN Messenger sets coefs to 28. The max ammount of pos+= is 14, so 
>>> pos will never be higher than 41 for all MSN Messenger created 
>>> videos. Is there something that can be done to make the check know it 
>>> will most likely always be < 64? (besides profiling)
>>
>> If the check really annoys you that much then theres a way to get rid 
>> of it.
>> Simply make block 256 entries large, so the uint8_t scantable cannot 
>> cause
>> out of array writes. But this must be carefully documented so noone
>> simplifies it away thinking the size being unneeded.
>>
>> or, make the End of block code vlc=256 instead of =0
>> and use
>>
>> pos += vlc & (256+15);
>>
>> if(pos >= 64)
>>     return vlc==256;
>>
>> (yes this also avoids the -1 check)
> 
> 4.7% slower.
> 
>>>> and
>>>> value= 1 - (1<<num_bits);
>>>> if(num_bits>1)
>>>>     value+= get_bits(num_bits-1);
>>>> if(get_bits1())
>>>>     value -= value;
>>>> block[ctx->scantable.permutated[pos]] = value;
>>>> Or if you think the table matters speedwise then make it static dont
>>>> duplicate it in every decoder instance.
>>>           ref   calc  static
>>> mimic.o : 51868 51304 52192
>>> avg time: 1.692 1.712 1.728
>>>
>>> That's very odd... Shouldn't the static table be at least as fast as the
>>> current code? 
>>
>> Yes
>> If you want to debug it the asm output from gcc would be a start but i
>> doubt its worth the time spend.
> 
> With a bigger sample, I got
>           ref   calc  static
> avg time: 6.688 6.932 6.744
> 
>>> Attached are the 2 patches I used to test. (lut_calc.diff and 
>>> lut_static.diff). Anyways, most FFmpeg stuff introduced a 
>>> considerable speedup over the original code, so I'm tempted to apply 
>>> one of these two. 
>>
>>> Could I use the hardcoded tables ifdef there?
>>
>> I dont see much sense in it. I mean the calc code doesnt need the 
>> table at
>> all and needs around 100 byte while the table needs 1024.
> 
> If it's ok I'd prefer to keep the current code with vlcdec_lookup in 
> MimicContext.
> 
>> [...]
>>>         if(vlc == -1)
>>>             return 0;
>>>         if(!vlc) /* end-of-block code */
>>>             return 1;
>>
>> maybe
>> if(vlc<=0)
>>     return !vlc;
>>
>> would be faster (no iam not suggesting this if its not faster)
> 
> It's also slower (1.7%), so I'll just leave it as is.
> 
> [...]
>>> static int decode(MimicContext *ctx, int quality, int num_coeffs,
>>>                   int is_pframe)
> [...]
>>>     for(plane = 0; plane < 3; plane++) {
> [...]
>>>         for(y = 0 ; y < ctx->num_vblocks[plane] ; y++) {
> [...]
>>>             for(x = 0; x < ctx->num_hblocks[plane]; x++) {
> [...]
>>>                 if(!is_pframe || get_bits1(&ctx->gb) == is_chroma) {
> [...]
>>>                     if(is_chroma || !is_pframe || 
>>> !get_bits1(&ctx->gb)) {
> [...]
>>>                     } else {
>>>                         unsigned int backref;
>>>                         AVPicture back_frame;
>>>                         /* Yes: read the backreference (4 bits) and 
>>> copy. */
>>>                         backref = get_bits(&ctx->gb, 4);
>>
>>>                         prepare_avpic(ctx, &back_frame, (AVPicture*)
>>>                               &ctx->buf_ptrs[(ctx->ptr_index + 
>>> backref) & 15]);
>>
>> It would be faster if this wouldnt be done per block
> 
> Now the backreference frames are also kept flipped in MimicContext.
> 
>> [...]
>>> /*
>>>  * initialize_vlcdec_lookup
>>>  *
>>>  * Internal helper-function used to initialize
>>>  * the lookup-table used by the VLC-decoder.
>>>  */
>>
>> not doxygen compatible
> 
> Fixed.
> 
>> [...]
>>
>>>     if(!(is_pframe && !ctx->prev_frame.data[0])) {
>>>         int swap_buf_size = buf_size - MIMIC_HEADER_SIZE;
>>>         ctx->swap_buf = av_fast_realloc(ctx->swap_buf, 
>>> &ctx->swap_buf_size,
>>>                                  swap_buf_size + 
>>> FF_INPUT_BUFFER_PADDING_SIZE);
>>>         if(!ctx->swap_buf)
>>>             return AVERROR_NOMEM;
>>>
>>>         ctx->dsp.bswap_buf((uint32_t*)ctx->swap_buf,
>>>                            (const uint32_t*) (buf + MIMIC_HEADER_SIZE),
>>>                            swap_buf_size>>2);
>>>         init_get_bits(&ctx->gb, ctx->swap_buf, swap_buf_size << 3);
>>>
>>>         if(!decode(ctx, quality, num_coeffs, is_pframe))
>>>             goto fail;
>>>     } else {
>>>         av_log(avctx, AV_LOG_ERROR, "decoding must start with 
>>> keyframe\n");
>>>         goto fail;
>>>     }
>>
>> Maybe get_buffer() could be called for the previous frame if its missing?
>> Not sure if this makes much sense, that is if frames can be lost ...
> 
> This only fails if the stream doesn't start decoding at the beginning, 
> so there are still no backreferences.

New file has a few changes to reflect the removal of extradata from the 
demuxer.
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: mimic.c
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080315/6a4cc748/attachment.txt>



More information about the ffmpeg-devel mailing list