[Ffmpeg-devel] FLAC encoder
Justin Ruggles
jruggle
Fri Jun 2 05:25:49 CEST 2006
Michael Niedermayer wrote:
> Hi
>
> On Thu, Jun 01, 2006 at 02:48:34AM -0400, Justin Ruggles wrote:
>
>>Justin Ruggles wrote:
>>
>>>Weak point: I have found a few very noisy samples that do not compress
>>>quite as well as with libFLAC. This may have to do with the fact that
>>>my encoder does not use verbatim encoding mode at all...I'm not 100%
>>>sure though.
>>
>>Hi,
>>
>>Well, the FLAC encoder is getting better by the day. I'll post some
>>results some time soon of my most recent version, which has marked
>>improvement in compression. The selection of optimal rice parameters
>>and partition order has nearly ceased being a bottleneck. Using the
>>fastest vs. slowest version only made a marginal speed difference and
>>hurt compression quality slightly (probably just precision error), so
>>I've redirected my concern to improving the LPC optimization algorithms
>>and other things.
>>
>>One area I've been trying to improve upon is the weak point I quoted
>>above. I was really annoyed and challenged :) when I found a website
>>devoted to collecting audio samples that are notoriously difficult to
>>encode. libFLAC beat my encoder hands down in almost every single test!
>> So my challenge has been to match or at least get close to libFLAC on
>>these "problem" type samples. Most of them have complex layered sounds
>>and high dynamic range or they have noisy types of sounds like cymbals
>>or distortion guitar.
>>
>>I've been doing lots of research and even more testing. I am writing
>>though because I need some advice from some people who know much more
>>about audio encoding than I do (and what better place to ask?). My math
>>background isn't what I wish it was. I'm just learning as I go and
>>reading as much as I can.
>>
>>So, one thing I've found that could really help the compression quality
>>is variable block sizes. Using different block sizes in a single stream
>>is not valid "subset" FLAC, but decoding is supported in both libFLAC
>>and FFmpeg, so I see no reason to ignore this potentially very
>>beneficial feature. When I manually adjust the static block size I can
>>get drastically better results. I can only imagine what being able to
>>adapt it during encoding could do.
>
>
> ok, but genrating such variable block streams should be optional as some
> decoders might not support it
>
True. My first thought is to use levels 0-8 to generate Subset streams,
then use higher levels for non-Subset features which may improve
compression. This would also be accompanied by a warning.
>
>
>>A large block size seems to be ideal for most general audio sources, but
>>the more difficult samples encode better with smaller block sizes. From
>>what I gather, doing some sort of transient detection on the block and
>>then splitting it into 2 smaller blocks if warranted (and doing this
>>recursively) could greatly increase the compression. Does anyone know
>>what might be some good algorithms to employ? Or at least a good place
>>where I could start my research?
>
>
> one obvious choice if you want optimal blocksizes is to use the
> viterbi algorithm, this can be very slow depending on the block sizes
> and other parameters you consider, with a limited set of block sizes though
> it should be fine
>
>
> another non optimal choice which is simpler is a recursive split algo, that
> obviously will be limited to blocksizes which are a power of 2 (2,4,8,16,32,...)
> and also limited to some specific recursive split pattern
> the split decission can either be done based on a heuristic (like the
> transient detection you suggest) or by bruteforce, obviosuly bruteforce will
> be slower ...
>
> the bruteforce could be done by some code like:
>
> int build_split_tree(pos, blockorder){
> int noplit_bits, split_bits;
> int blocksize= 1<<blockorder;
>
> if(blocksize < min_blocksize)
> return 99999999;
>
> noplit_bits= estimate_the_number_of_bits(pos * blocksize, blocksize);
> split_bits = build_split_tree(2*pos , blockorder-1);
> split_bits+= build_split_tree(2*pos+1, blockorder-1);
>
> if(split_bits < no_split_bits){
> split_tree[blockorder][pos]=1;
> return split_bits;
> }else{
> split_tree[blockorder][pos]=0;
> return nosplit_bits;
> }
> }
>
> [...]
Thanks for the advice. I will probably implement the split tree routine
but with transient detection for each level to determine whether or not
to split. In my researching, I've found what seems to be a pretty good
transient detection algorithm. The 1st few steps are similar to that of
AC-3. It starts by doing a high-pass filter of the input samples.
Next, the area that is being analysed is divided into a number of
smaller sub-blocks. The next steps are what differ from AC-3. The mean
absolute sample value is calculated for each sub-block instead of the
maximum value. A ratio value is calculated by dividing the largest
value sub-block by the average of the 3 smallest value sub-blocks. An
adaptive threshold is applied to this ratio to determine if a transient
is present. This sounds pretty good in theory. I just hope it isn't
too slow. Anyone know of another algorithm that might be better before
I start working on this one? The author of the paper claims it works
better than the AC-3 transient detection, but if this algorithm is too
slow, I could always try using the one from AC-3.
-Justin
More information about the ffmpeg-devel
mailing list