[FFmpeg-devel] [PATCH] Chinese AVS encoder
Stefan Gehrer
stefan.gehrer
Thu Jul 26 07:34:26 CEST 2007
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Michael Niedermayer wrote:
> Hi
>
> On Wed, Jul 25, 2007 at 07:50:17AM +0200, Stefan Gehrer wrote:
> [...]
>>> [...]
>>>> +/**
>>>> + * eliminate residual blocks that only have insignificant coefficients,
>>>> + * inspired from x264 and JVT-B118
>>>> + */
>>>> +static inline int decimate_block(uint8_t *run, DCTELEM *level, int count) {
>>>> + static const uint8_t run_score[30] = {
>>>> + 0,3,3,3,3,2,2,2,2,2,2,2,2,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0 };
>>>> + int i;
>>>> + int score = 0;
>>>> +
>>>> + if(count>4)
>>>> + return 9;
>>>> + for(i=0;i<count;i++) {
>>>> + int abslevel = FFABS(level[i]);
>>>> + if(abslevel > 1)
>>>> + return 9;
>>>> + score += run_score[FFMIN(run[i],29)];
>>>> + }
>>>> + return score;
>>>> +}
>>> how much psnr/bitrate is gained by this? if none then please drop it
>> I tested on the popular foreman (300 CIF frames). When encoding at
>> around 800kbps, roughly 0.1dB are gained by this. When encoding at
>> around 200kbps, this increases to a gain of around 0.15dB.
>> So I would like to keep it.
>
> ok, but i think there is something wrong elsewhere if there is such a
> large gain from this ...
> maybe a simple change to the bias would help? or maybe there is something
> else wrong with the quantization code ...
>
why? It seems clear to me that coding a macroblock as skipped is so much
cheaper than coding type/cbp/mv/coefficients that it might be worth
throwing some real information away. By real I mean coefficients that
come out of a fully correct quantization code, though I am not claiming
my code to be that.
>
> [...]
>>>> + for(i=0;i<15;i++)
>>>> + if((ff_frame_rate_tab[i].den == frame_rate.den) &&
>>>> + (ff_frame_rate_tab[i].num == frame_rate.num))
>>>> + frame_rate_code = i;
>>>> + if(frame_rate_code < 0) {
>>>> + av_log(h->s.avctx, AV_LOG_ERROR, "unsupported framerate %d/%d\n",
>>>> + frame_rate.num, frame_rate.den);
>>>> + return -1;
>>>> + }
>>>
>>> [...]
>>>> + put_bits(&s->pb,16,0);
>>>> + put_bits(&s->pb,16,CAVS_START_CODE);
>>> add a put_bits_long() to bitstrea.c, document the issue with 32bits and
>>> fix vorbis_enc.c :)
>> I looked a bit into this and I think a better way would be to fix
>> put_bits() to support up to 32 bits instead of up to 31 bits.
>> I am not so sure, but after a bit of checking in vorbis_enc.c it
>> seems there is never any writing with more than 32 bits even
>> though the local put_bits() function takes a 64bit argument.
>
> argh, ive not noticed that vorbis is using a private put_bits
> iam starting to hate vorbis_enc.c ...
>
>
>> Attached a new encoder patch
>
>> and a proposed fix for put_bits().
>
> the fix to put_bits is rejected due to a likely huge slowdown
>
It is in the else branch and only executed every 32 bits of bitstream
writing, so I don't think the slowdown will be "huge". But if it's
not acceptable I will think about adding a put_bits_long again.
>
> [...]
>> /*****************************************************************************
>> *
>> + * quantization
>> + *
>> + ****************************************************************************/
>> +
>> +static void cavs_quant_c(DCTELEM *block, const uint16_t *norm, int mul,
>> + int bias) {
>> + int i;
>> +
>> + for(i=0;i<64;i++) {
>> + if(block[i] > 0)
>> + block[i] = ((((1<<18) + block[i]*norm[i])>>19)*mul+bias)>>15;
>> + else
>> + block[i] = -(((((1<<18) - block[i]*norm[i])>>19)*mul+bias)>>15);
>> + }
>
> this is a duplicate of dct_quantize*
> its just much slower and smaller
>
>
Looks quite different to me, but I will see if I can reuse it somehow.
> [...]
>> + int level = abs(level_buf[coeff_num])-1;
>> + int sign = (level_buf[coeff_num]>>31)&1;
>
> MASK_ABS()
>
>
> [...]
>
>> + h->lambda = h->qp/3;
>
> if i understand it correctly then qp is a logarithmic scale, if so this is
> likely completely wrong
>
> h->lambda = C*quant_mul[h->qp];
> might be more correct
>
[...]
>
> with SSE its likely lambda*lambda with the above definition of lambda
>
Yes, my lambda has no mathematical background. With many of the
parameters I just hand-tuned them and found that a lambda close
to zero seemed to give the best results. Similarly the quantization
biases and the threshold for the intra/inter decision are just
derived from trial and error.
>
> [...]
>> + h->dist[0] = (h->picture.poc - h->DPB[0].poc + 512) % 512;
>> + h->dist[1] = (h->picture.poc - h->DPB[1].poc + 512) % 512;
>
> &511 ?
>
It is not speed critical, and then I find the modulo more readable.
>
> [...]
>> + memcpy(&h->DPB[1], &h->DPB[0], sizeof(Picture));
>
> h->DPB[1]= h->DPB[0];
>
Right, will change.
>
> [...]
>> + h->levels[0] = av_mallocz(6*64*sizeof(DCTELEM));
>> + h->runs[0] = av_mallocz(6*64);
>> + for(i=1;i<6;i++) {
>> + h->levels[i] = h->levels[i-1] + 64;
>> + h->runs[i] = h->runs[i-1] + 64;
>> + }
>
> why is this allocated with malloc instead of being part of the context?
It keeps the context smaller when only the decoder is running.
Stefan
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFGqDJhYpYOlpT3vNMRAhmtAJ4rHB9KbjPBesfl0bws0XBwAAPyrwCeJRcO
onkNRnkGiZctJtsmVi3pZME=
=VVSC
-----END PGP SIGNATURE-----
More information about the ffmpeg-devel
mailing list