[Ffmpeg-devel] FLAC encoder

Fri Jun 2 05:25:49 CEST 2006

Michael Niedermayer wrote:
> Hi
> 
> On Thu, Jun 01, 2006 at 02:48:34AM -0400, Justin Ruggles wrote:
> 
>>Justin Ruggles wrote:
>>
>>>Weak point: I have found a few very noisy samples that do not compress
>>>quite as well as with libFLAC.  This may have to do with the fact that
>>>my encoder does not use verbatim encoding mode at all...I'm not 100%
>>>sure though.
>>
>>Hi,
>>
>>Well, the FLAC encoder is getting better by the day.  I'll post some
>>results some time soon of my most recent version, which has marked
>>improvement in compression.  The selection of optimal rice parameters
>>and partition order has nearly ceased being a bottleneck.  Using the
>>fastest vs. slowest version only made a marginal speed difference and
>>hurt compression quality slightly (probably just precision error), so
>>I've redirected my concern to improving the LPC optimization algorithms
>>and other things.
>>
>>One area I've been trying to improve upon is the weak point I quoted
>>above.  I was really annoyed and challenged :) when I found a website
>>devoted to collecting audio samples that are notoriously difficult to
>>encode.  libFLAC beat my encoder hands down in almost every single test!
>> So my challenge has been to match or at least get close to libFLAC on
>>these "problem" type samples.  Most of them have complex layered sounds
>>and high dynamic range or they have noisy types of sounds like cymbals
>>or distortion guitar.
>>
>>I've been doing lots of research and even more testing.  I am writing
>>though because I need some advice from some people who know much more
>>about audio encoding than I do (and what better place to ask?).  My math
>>background isn't what I wish it was.  I'm just learning as I go and
>>reading as much as I can.
>>
>>So, one thing I've found that could really help the compression quality
>>is variable block sizes.  Using different block sizes in a single stream
>>is not valid "subset" FLAC, but decoding is supported in both libFLAC
>>and FFmpeg, so I see no reason to ignore this potentially very
>>beneficial feature.  When I manually adjust the static block size I can
>>get drastically better results.  I can only imagine what being able to
>>adapt it during encoding could do.
> 
> 
> ok, but genrating such variable block streams should be optional as some
> decoders might not support it
> 

True.  My first thought is to use levels 0-8 to generate Subset streams,
then use higher levels for non-Subset features which may improve
compression.  This would also be accompanied by a warning.

> 
> 
>>A large block size seems to be ideal for most general audio sources, but
>>the more difficult samples encode better with smaller block sizes.  From
>>what I gather, doing some sort of transient detection on the block and
>>then splitting it into 2 smaller blocks if warranted (and doing this
>>recursively) could greatly increase the compression.  Does anyone know
>>what might be some good algorithms to employ?  Or at least a good place
>>where I could start my research?
> 
> 
> one obvious choice if you want optimal blocksizes is to use the
> viterbi algorithm, this can be very slow depending on the block sizes
> and other parameters you consider, with a limited set of block sizes though
> it should be fine
> 
> 
> another non optimal choice which is simpler is a recursive split algo, that 
> obviously will be limited to blocksizes which are a power of 2 (2,4,8,16,32,...)
> and also limited to some specific recursive split pattern
> the split decission can either be done based on a heuristic (like the
> transient detection you suggest) or by bruteforce, obviosuly bruteforce will
> be slower ...
> 
> the bruteforce could be done by some code like:
> 
> int build_split_tree(pos, blockorder){
>     int noplit_bits, split_bits;
>     int blocksize= 1<<blockorder;
> 
>     if(blocksize < min_blocksize)
>         return 99999999;
> 
>     noplit_bits= estimate_the_number_of_bits(pos * blocksize, blocksize);
>     split_bits = build_split_tree(2*pos  , blockorder-1);
>     split_bits+= build_split_tree(2*pos+1, blockorder-1);
> 
>     if(split_bits < no_split_bits){
>         split_tree[blockorder][pos]=1;
>         return split_bits;
>     }else{
>         split_tree[blockorder][pos]=0;
>         return nosplit_bits;
>     }
> }
> 
> [...]

Thanks for the advice.  I will probably implement the split tree routine
but with transient detection for each level to determine whether or not
to split.  In my researching, I've found what seems to be a pretty good
transient detection algorithm.  The 1st few steps are similar to that of
AC-3.  It starts by doing a high-pass filter of the input samples.
Next, the area that is being analysed is divided into a number of
smaller sub-blocks.  The next steps are what differ from AC-3.  The mean
absolute sample value is calculated for each sub-block instead of the
maximum value.  A ratio value is calculated by dividing the largest
value sub-block by the average of the 3 smallest value sub-blocks.  An
adaptive threshold is applied to this ratio to determine if a transient
is present.  This sounds pretty good in theory.  I just hope it isn't
too slow.  Anyone know of another algorithm that might be better before
I start working on this one?  The author of the paper claims it works
better than the AC-3 transient detection, but if this algorithm is too
slow, I could always try using the one from AC-3.

-Justin