[FFmpeg-devel] [PATCH] Chinese AVS encoder

Thu Jul 26 22:34:15 CEST 2007

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi

Michael Niedermayer wrote:
>>>>> [...]
>>>>>> +/**
>>>>>> + * eliminate residual blocks that only have insignificant coefficients,
>>>>>> + * inspired from x264 and JVT-B118
>>>>>> + */
>>>>>> +static inline int decimate_block(uint8_t *run, DCTELEM *level, int count) {
>>>>>> +    static const uint8_t run_score[30] = {
>>>>>> +        0,3,3,3,3,2,2,2,2,2,2,2,2,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0 };
>>>>>> +    int i;
>>>>>> +    int score = 0;
>>>>>> +
>>>>>> +    if(count>4)
>>>>>> +        return 9;
>>>>>> +    for(i=0;i<count;i++) {
>>>>>> +        int abslevel = FFABS(level[i]);
>>>>>> +        if(abslevel > 1)
>>>>>> +            return 9;
>>>>>> +        score += run_score[FFMIN(run[i],29)];
>>>>>> +    }
>>>>>> +    return score;
>>>>>> +}
>>>>> how much psnr/bitrate is gained by this? if none then please drop it
>>>> I tested on the popular foreman (300 CIF frames). When encoding at
>>>> around 800kbps, roughly 0.1dB are gained by this. When encoding at
>>>> around 200kbps, this increases to a gain of around 0.15dB.
>>>> So I would like to keep it.
>>> ok, but i think there is something wrong elsewhere if there is such a
>>> large gain from this ... 
>>> maybe a simple change to the bias would help? or maybe there is something
>>> else wrong with the quantization code ...
>>>
>> why? It seems clear to me that coding a macroblock as skipped is so much
>> cheaper than coding type/cbp/mv/coefficients that it might be worth
>> throwing some real information away. By real I mean coefficients that
>> come out of a fully correct quantization code, though I am not claiming
>> my code to be that.
> 
> you should just make RD optimal decissions (IIRC you do calculate the RD all
> over the place anyway), if you do no heuristic like above will help

Okay, I think I see what you mean. Currently I use RD-decisions to
select intra prediction modes and for the motion estimation, but for
the macroblock mode decision there are some heuristics roughly like
this:

if(only insignificant coefficients)
  encode mb as skipped
else
  RD = P motion estimation
  if(RD < threshold)
    encode mb as p
  else
    encode mb as intra

By using RD optimal decisions do you mean I should encode every
macroblock in all three modes (skipped/p/intra) and select the
best of three, or rather I should try one mode and if its RD
cost is below a certain threshold I should take it, otherwise
I should move on to the next mode?

> 
>>> [...]
>>>>>> +    for(i=0;i<15;i++)
>>>>>> +        if((ff_frame_rate_tab[i].den == frame_rate.den) &&
>>>>>> +           (ff_frame_rate_tab[i].num == frame_rate.num))
>>>>>> +            frame_rate_code = i;
>>>>>> +    if(frame_rate_code < 0) {
>>>>>> +        av_log(h->s.avctx, AV_LOG_ERROR, "unsupported framerate %d/%d\n",
>>>>>> +               frame_rate.num, frame_rate.den);
>>>>>> +        return -1;
>>>>>> +    }
>>>>> [...]
>>>>>> +        put_bits(&s->pb,16,0);
>>>>>> +        put_bits(&s->pb,16,CAVS_START_CODE);
>>>>> add a put_bits_long() to bitstrea.c, document the issue with 32bits and
>>>>> fix vorbis_enc.c :)
>>>> I looked a bit into this and I think a better way would be to fix
>>>> put_bits() to support up to 32 bits instead of up to 31 bits.
>>>> I am not so sure, but after a bit of checking in vorbis_enc.c it
>>>> seems there is never any writing with more than 32 bits even
>>>> though the local put_bits() function takes a 64bit argument.
>>> argh, ive not noticed that vorbis is using a private put_bits
>>> iam starting to hate vorbis_enc.c ...
>>>
>>>
>>>> Attached a new encoder patch 
>>>> and a proposed fix for put_bits().
>>> the fix to put_bits is rejected due to a likely huge slowdown
>>>
>> It is in the else branch and only executed every 32 bits of bitstream
>> writing, so I don't think the slowdown will be "huge". But if it's
>> not acceptable I will think about adding a put_bits_long again.
> 
> the slowdown would affect all codecs, and i think it would be a significant
> slowdown
> its totally unacceptable just so you can replace
> put_bits(16); put_bits(16);
> by
> put_bits(32)
> in 1 or 2 codecs
> and without any benchmarks ...
> 

So the problem is:
h263.c has put_bits_(16);put_bits(16);
cavsenc.c has put_bits_(16);put_bits(16);
vorbis_enc.c has it's own put_bits implementation

Solutions:
keep as is (you complained)
fix as I proposed (too slow)
create a proper put_bits_long function (seems overkill)

what about a silly and untested
void put_bits_long(PutBitContext *s, int n, uint64_t value) {
  int rem = n>>1;
  put_bits(s,n-rem,value>>rem);
  put_bits(s,rem,value&((1<<rem)-1));
}
to encode 0..62 bits ?

> 
> [...]
>>> [...]
>>>> +    h->dist[0] = (h->picture.poc - h->DPB[0].poc  + 512) % 512;
>>>> +    h->dist[1] = (h->picture.poc - h->DPB[1].poc  + 512) % 512;
>>> &511 ?
>>>
>> It is not speed critical, and then I find the modulo more readable.
> 
> its +% vs &, ( 2 vs. 1 operations) so the modulo likely does make the
> binary by a byte or 2 larger ...
> 
> 
> [...]
>>> [...]
>>>> +        h->levels[0] = av_mallocz(6*64*sizeof(DCTELEM));
>>>> +        h->runs[0]   = av_mallocz(6*64);
>>>> +        for(i=1;i<6;i++) {
>>>> +            h->levels[i] = h->levels[i-1] + 64;
>>>> +            h->runs[i]   = h->runs[i-1]   + 64;
>>>> +        }
>>> why is this allocated with malloc instead of being part of the context?
>> It keeps the context smaller when only the decoder is running.
> 
> hmm, ok, still its not ideal, as this requires an extra pointer
> dereference to access

Ok, I will change this in the next round.

Stefan
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGqQVHYpYOlpT3vNMRAsF+AKCXSeEzTDNKssm3PkHFHsVQpzXgJwCg5nI2
F7ydbEy1vt/XaBQjvA34GKs=
=4EuI
-----END PGP SIGNATURE-----