[FFmpeg-devel] [PATCH] MLP Encoder

Thu Aug 21 15:47:13 CEST 2008

On Thu, Aug 21, 2008 at 01:13:32AM -0300, Ramiro Polla wrote:
> Hello,
> 
> A lot has changed since last review. Many revision touched many
> functions (like buffering data). So I did not commit any parts (there
> might be a couple of OKed hunks that haven't changed though). There
> are a couple of TODOs in the code that might require big changes again
> (like not memcpy'ing params in mlp_encode_frame() and not copying
> samples to a temporary buffer for set_filter_params()).
> 
> I have not tried IIR filters again.
> 

> And I got stuck in the lossless matrix in a point that even if it
> might be really really easy, my head just blocks when it reads or
> hears the word "matrix". There is some working code, but changing one
> line or another, or even when I try to add something simple,
> everything goes horribly wrong and I'm hopeless at debugging at the
> moment (even if it might be easy). I think it might even be better to
> just disable the code that's currently there since it is very dumb. It
> saves ~40% in 440hz.wav, since both channels are identical. It saves
> ~2.5% in luckynight.mlp. But I imagine it will just bloat any sample
> that doesn't share anything between both channels.
> 

> Now for Good News:
> With luckynight.mlp
> decoded  wav: 10668844
> original mlp:  7899056  25.96%
> encoded  mlp:  7931348  25.66%

that looks better
but you are still 0.4% worse and i suspect we can do quite a bit better than
the original

[...]
> >>
> >> typedef struct {
> >>     //! number of PCM samples in current audio block
> >>     uint16_t        blocksize;
> >
> > i do not think this needs to be uint16
> 
> For some reason changing to unsigned int was bad for compression. I
> haven't debugged why much longer.

changing blocksize to unsigned int ?
or changing all to unsigned int?

[...]
> >>
> >> /** Calculates the smallest number of bits it takes to encode a given signed
> >>  *  value in two's complement.
> >>  */
> >> static int inline number_sbits(int number)
> >> {
> >>     int bits = 0;
> >>
> >>     if      (number > 0)
> >>         for (bits = 31; bits && !(number & (1<<(bits-1))); bits--);
> >>     else if (number < 0)
> >>         for (bits = 31; bits &&  (number & (1<<(bits-1))); bits--);
> >
> > isnt something with av_log2(FFABS()) faster?
> 
> Changed. But mru suggested having a clz() function because his ARM can
> do it in one instruction (he repeats this a couple of times every day
> on IRC).

infinite loop, reboot him and contact technical support. :)

x86 has its bsr/bsf ocodes as well and ive seen gcc replace loops by them
now we all know gcc is shit on x86 and bullshit on non x86 ...

so i suspect there could be a ifdef ARM asm("clz" ... or was it really a
function? in av_log2(). Could, if its really faster help more than just mlp 

[...]
> >>     int min = INT_MAX, max = INT_MIN;
> >>     int bits, shift;
> >>     int or = 0;
> >>     int order;
> >>
> >>     for (order = 0; order < fp->order; order++) {
> >>         int coeff = fp->coeff[order];
> >>
> >
> >
> >>         if (coeff < min)
> >>             min = coeff;
> >>         if (coeff > max)
> >>             max = coeff;
> >>
> >>         or |= coeff;
> >>     }
> >>
> >>     bits = FFMAX(number_sbits(min), number_sbits(max));
> >
> > max= FFMAX(max, FFABS(coeff))
> >
> > that way you also do not need a signed "bits counter" ...
> 
> Hmmm... Either I'm missing something or it's not that simple.
> number_sbits() is not as easy to deal with:
>  -5: 4
>  -4: 3
>  -3: 3
>  -2: 2
>  -1: 1
>   0: 1
>   1: 2
>   2: 3
>   3: 3
>   4: 4

hmm

max= FFMAX(max, coeff ^ (coeff>>31))

[...]
> >> /** Applies the filter to the current samples, and saves the residual back
> >>  *  into the samples buffer. If the filter is too bad and overflows the
> >>  *  maximum amount of bits allowed (24), the samples buffer is left as is and
> >>  *  the function returns -1.
> >>  */
> >
> > i do not belive overflow is possible
> > without checking the code too carefull i suspect you got the wrap around wrong
> > somewhere
> > remember (a+b) & MASK = c & MASKis the same as a & MASK = (c-b) & MASK
> 
> I got a sample from Justin that overflows at this point. I don't see
> how masking could help, since the mask only clears LSBs. And I don't
> see how it could wrap around (it shouldn't).

hmm, if so mlp is poorly designed, anyway where does the limit of 24 bits come
from? In the decoder i see a huff_lsbs that is read as get_bits(5) so it
should be able to reach 31 ...

[...]
> > [...]
> >> /** Writes the residuals to the bitstream. That is, the vlc codes from the
> >>  *  codebooks (if any is used), and then the residual.
> >>  */
> >> static void write_block_data(MLPEncodeContext *ctx, PutBitContext *pb,
> >>                              unsigned int substr)
> >> {
> >>     DecodingParams *dp = &ctx->decoding_params[substr];
> >>     RestartHeader  *rh = &ctx->restart_header [substr];
> >>     int32_t sign_huff_offset[MAX_CHANNELS];
> >>     int codebook            [MAX_CHANNELS];
> >>     int lsb_bits            [MAX_CHANNELS];
> >>     unsigned int i, ch;
> >>
> >>     for (ch = rh->min_channel; ch <= rh->max_channel; ch++) {
> >>         ChannelParams *cp = &ctx->channel_params[ch];
> >>         int sign_shift;
> >>
> >>         lsb_bits        [ch] = cp->huff_lsbs - dp->quant_step_size[ch];
> >
> >>         codebook        [ch] = cp->codebook  - 1;
> >
> > why -1 ?
> 
> Just because it's easier to access ff_mlp_huffman_tables[codebook[ch]] this way.

this makes one codebook and the other codebook_plus1 or _minus1.
As is, its a recipe for confusion ("which one was codebook=1")

[...]
> > [...]
> >> static int mlp_encode_frame(AVCodecContext *avctx, uint8_t *buf, int buf_size,
> >>                             void *data)
> >> {
> >>     DecodingParams decoding_params[MAX_SUBSTREAMS];
> >>     uint16_t substream_data_len[MAX_SUBSTREAMS];
> >>     int32_t lossless_check_data[MAX_SUBSTREAMS];
> >>     ChannelParams channel_params[MAX_CHANNELS];
> >>     MLPEncodeContext *ctx = avctx->priv_data;
> >>     uint8_t *buf2, *buf1, *buf0 = buf;
> >>     int total_length;
> >>     unsigned int substr;
> >>     int restart_frame;
> >>
> >>     if (avctx->frame_size > MAX_BLOCKSIZE) {
> >>         av_log(avctx, AV_LOG_ERROR, "Invalid frame size (%d > %d)\n",
> >>                avctx->frame_size, MAX_BLOCKSIZE);
> >>         return -1;
> >>     }
> >>
> >>     memcpy(decoding_params, ctx->decoding_params, sizeof(decoding_params));
> >>     memcpy(channel_params, ctx->channel_params, sizeof(channel_params));
> >>
> >>     if (buf_size < 4)
> >>         return -1;
> >>
> >>     /* Frame header will be written at the end. */
> >>     buf      += 4;
> >>     buf_size -= 4;
> >>
> >
> >>     restart_frame = !(avctx->frame_number & (MAJOR_HEADER_INTERVAL - 1));
> >
> > gop_size could be used here
> 
> The rest of the code is good to have ctx->major_header_interval set by
> the user, but it's still not as easy as just using avctx->gop_size,
> since it is for video only. (There's another thread about this
> somewhere...)

?
add a A to the AVOption table in utils.c and gop_size is for audio as well

Before furter reviews i think we need to discuss the high level design.
If i understand your encoder correctly, please correct my if iam wrong
it does use a fixed major header interval, fixed blocksize, a single
fixed channel decorrelation matrix, and makes no attempt at all to
optimize the places where any parameters change. Considering the small
blocksize of 40 samples changing parameters randomly when they happen
to appear better can cause very significant overhead and overall 
compression loss.
I hope ive not misunderstood the code, but if not this design is rather
primitive.

* Determining the channel decorrelation matrix
  let us forget about PCA for the time being, just implement what jai
  did in his ALAC encoder. That is
  attempt to decorrelate the channels with the simple 4 standard
  decorrelations L,R; L-R,L; L-R,R, L-R,(L+R)>>1
  To determine which of the 4 to use you can either
  A. choose it per major sync interval using the heuristic jai used
  B. calculate the exact number of bits for each of the 4 and use the
     viterbi algorithm to determine which of the 4 to use per block.
  (for patch approval id like to see both implemented, as A is faster and B
   is strictly better, of course you can implement anything else that is
   supperior to my suggestion if you prefer, PCA could be added as a 5th
   decorrelation or replace the 4 when that is equaly good ...)

* Determinimg what codebook to use.
  This has to be done using viterbi, i see no sensible alternative.
  The values are already brute force calculated so there should be no
  speed loss.

* Determine what huff_offset to use.
  This may be more tricky, taking the current set of per block "optimal"
  choices you already calculate and choose them for each block using
  viterbi would be an option, this should be acceptable speed as there are a
  max of 16 different offsets. This can later be extended toward considering
  the optimal offsets for all consecutive blocks when this helps compression
  or we can also try to determine the optimal offset per major header interval
  It would probably be ideal to implement all and then throw away what turned
  out to be a bad choice

* Determine the blocksize
  It should be easy to try various (2^x based block sizes) for each
  major header interval by simple brute force at higher compression_level
  btw, why is it 40 ATM ?

* Determine the major_header_interval
  Well i suggest we leave this fixed and set AVCodecContext.frame_size to it
  for sake of simplicity.

* Determine the LPC coeffs
  at higher compression_levels they should be selected by the already
  described viterbi algorithm over a set of optimal LPC coeffs for each
  consecutive run of blocks.

Questions, suggestions and critique welcome

[...]
> typedef struct {
>     AVCodecContext *avctx;
> 
>     int             num_substreams;         ///< Number of substreams contained within this stream.
> 
>     int             num_channels;   /**< Number of channels in major_frame_buffer.
>                                      *   Normal channels + noise channels. */
> 
>     int             sample_fmt;             ///< sample format encoded for MLP

>     int             mlp_sample_rate;        ///< sample rate encoded for MLP

as its not a sample_rate it shouldnt be called such
sample_rate_code would be a more appropriate name
the same is true ofr other simlar mlp_ fields

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

I am the wisest man alive, for I know one thing, and that is that I know
nothing. -- Socrates
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080821/cec4bcc7/attachment.pgp>