[FFmpeg-devel] [PATCH] Mimic decoder
Ramiro Polla
ramiro
Sun Mar 16 20:15:41 CET 2008
Hello,
Michael Niedermayer wrote:
> On Sat, Mar 15, 2008 at 06:47:03PM -0300, Ramiro Polla wrote:
>> Hello,
>>
>> [...]
>>>>>> /*
>>>>>> * vlc_decode_block
>>>>>> *
>>>>>> * De-serialize (reconstruct) a variable length coded 8x8 block.
>>>>>> */
>>>>>> static int vlc_decode_block(MimicContext *ctx, DCTELEM *block, int
>>>>>> num_coeffs)
>>>>>> {
>>>>>> unsigned int pos;
>>>>>>
>>>>>> memset(block, 0, 64 * sizeof(DCTELEM));
>>>>> clear_blocks() in dsp
>>>> clear_blocks() clears 6*64*sizeof(DCTELEM). On mimic it's only
>>>> 64*sizeof(DCTELEM). Is there another function that will clear only what I
>>>> need?
>>> Hmm, not that i remember, anyway you can move the memset to after the idct
>>> that way it will be in the cache when its cleared.
>> Unless I misunderstood you, the blocks are on the stack, so there's no
>> reason to clear them after the IDCT. If it's ok I'll leave that memset
>> there.
>
> Stacks and alignment and gcc ...
Moved to MimicContext to avoid gcc randomness.
> [...]
>>>>> and
>>>>> value= 1 - (1<<num_bits);
>>>>> if(num_bits>1)
>>>>> value+= get_bits(num_bits-1);
>>>>> if(get_bits1())
>>>>> value -= value;
>>>>> block[ctx->scantable.permutated[pos]] = value;
>>>>> Or if you think the table matters speedwise then make it static dont
>>>>> duplicate it in every decoder instance.
>>>> ref calc static
>>>> mimic.o : 51868 51304 52192
>>>> avg time: 1.692 1.712 1.728
>>>>
>>>> That's very odd... Shouldn't the static table be at least as fast as the
>>>> current code?
>>> Yes
>>> If you want to debug it the asm output from gcc would be a start but i
>>> doubt its worth the time spend.
>> With a bigger sample, I got
>> ref calc static
>> avg time: 6.688 6.932 6.744
>
> And the standard deviations? Mean is only one side of the story ...
The more tests I run, the more innacurate they become.
ref calc static
avg time: 6.819993 6.873992 6.823993
std dev : 0.017436 0.022716 0.022627
What's the best way to run benchmarks?
With no X, I run 20 runs of
./ffmpeg_g -i ../data/y.ml20 -vcodec rawvideo -f rawvideo -r 5 -y /dev/null
Is there a way to just benchmark demuxing+decoding without anything
else? (besides writing my own simple program).
The input video has less than 5 frames per second, so I use -r 5 to
avoid outputting 1000 fps.
>>>> Attached are the 2 patches I used to test. (lut_calc.diff and
>>>> lut_static.diff). Anyways, most FFmpeg stuff introduced a considerable
>>>> speedup over the original code, so I'm tempted to apply one of these two.
>>>> Could I use the hardcoded tables ifdef there?
>>> I dont see much sense in it. I mean the calc code doesnt need the table at
>>> all and needs around 100 byte while the table needs 1024.
>> If it's ok I'd prefer to keep the current code with vlcdec_lookup in
>> MimicContext.
>
> You can keep it if you merge the *qscale into the table as its not
> const then anymore ...
I don't understand how. qscale depends on quality, which can change per
frame. qscale also depends on pos...
>>> [...]
>>>> if(vlc == -1)
>>>> return 0;
>>>> if(!vlc) /* end-of-block code */
>>>> return 1;
>>> maybe
>>> if(vlc<=0)
>>> return !vlc;
>>> would be faster (no iam not suggesting this if its not faster)
>> It's also slower (1.7%), so I'll just leave it as is.
>
> hmm the end-of-block check should be before the invalid check at least
> because it will be more often true.
> also as you already tried various, what about
>
> if(vlc<=0)
> return vlc;
>
> with the others adjusted accordingly
>
> or
>
> pos += vlc & (256+15);
> if(pos >= 64)
> return 0;
>
> (that is leaving the end of block check but merging the vlc==-1 check with
> the pos >=64 check)
> This one should be faster in theory of course gcc is unpredictable
Just switched eob check to before -1 as most benchmarks oscillated
around the same avg+-stddev.
> [...]
>> /* FFmpeg's IDCT behaves somewhat different from the original code, so
>> * a factor of 4 was added to the input */
>>
>> coeff = ctx->vlcdec_lookup[num_bits][value];
>> if(pos<3)
>> coeff <<= 4;
>> else /* TODO Use >> 10 instead of / 1001 */
>> coeff = (coeff * qscale) / 1001;
>
> Can you elaborate a little about these 1024 vs. 1001 and
> "different from the original" ?
>>10 was suggested by Mans. He said it'd be better to avoid a division
for each coeff and use shift instead. But then qscale would have to be
somewhat different so that >>10 would yield the same result as / 1001. I
didn't get the output to be bit-exact, so I decided to just add that TODO.
"different from the original" is relative to the libmimic IDCT. It
returns values 4 times greater than FFmpeg's IDCT. So because of that
the input coeffs were >>2.
> [...]
>> /**
>> * Internal helper-function used to initialize
>> * the lookup-table used by the VLC-decoder.
>> */
>> static void initialize_vlcdec_lookup(int8_t lookup_tbl[8][128])
>> {
>> int i, j;
>>
>> for(i = 1 ; i < 8 ; i++) {
>> int first = (1<<i)-1;
>> int last = 1<<(i-1);
>> int cur = first;
>
> cur - first is redundant remove one please
Removed first.
> [...]
>> ctx->picture.data[0] = NULL;
>> if(avctx->get_buffer(avctx, &ctx->picture)) {
>> av_log(avctx, AV_LOG_ERROR, "get_buffer() failed\n");
>> return -1;
>> }
>
> reference=1 is missing before get_buffer()
Added.
> [...]
>> AVCodec mimic_decoder = {
>> "mimic",
>> CODEC_TYPE_VIDEO,
>> CODEC_ID_MIMIC,
>> sizeof(MimicContext),
>> mimic_decode_init,
>> NULL,
>> mimic_decode_end,
>> mimic_decode_frame
>> };
>
> CODEC_CAP_DR1 is missing
Added.
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: mimic.c
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080316/18fd79d6/attachment.asc>
More information about the ffmpeg-devel
mailing list