[FFmpeg-devel] Important: upcoming CELT bitstream freeze!

Thu Nov 18 17:28:31 CET 2010

On Thu, Nov 18, 2010 at 8:40 AM, Ronald S. Bultje <rsbultje at gmail.com> wrote:
> People generally complain about FFmpeg's bad encoders.
>
> I foresee I can essentially totally kill CELT's prospect of taking
> over the world by writing a shit encoder in FFmpeg (compare
> Theora/Vorbis). Not that I would, but do you see my point? It'd be
> nice if you provide a patch that adds wrapper libs to FFmpeg at the
> very least, so we have something to start off from.
>
> Unfortunately, no time to review CELT right now, thesis thesis thesis,
> but I'll try to look at it at some point...

I don't think ffmpeg can screw up the encoder if you get a working
decoder. It would hard to be too much dumber than our current encoder?
it has no explicit psy-model, for example. It has fairly few tunings
which are not normative in the format.

A result of "good for low latency + tolerant of packet loss" design
requirements is that side information must be minimized. We
accomplished this by designing the format so that the dumbest thing
was usually perceptually correct, then made the decoder do this
without being told. This also means that a really dumb and fast
encoder in an embedded device can give acceptable quality.

So, The encoder has fairly few things that it can choose to change.

(1) The encoder can pick the bitrate for VBR? the VBR in our encoder
is currently mostly only useful to make sure decoders handle VBR. ?It
ensures that a decoder would never need to buffer more than a single
frame, so it doesn't do much.

(2) The encoder can tilt the overall bit allocation towards the HF or
LF. We don't make use of this in our encoder. (It's useful, as can
easily be demonstrated on hand picked clips, we just don't have the
psy analysis for it)

(3) The encoder can boost the allocation in some bands. Our encoder
has a maximally dumb analysis for this, if a band is much louder than
its neighbors it gets a boost. (this helps quality, but it's mostly
there for testing the bitstream)

(4) The encoder can choose how bands without band data are filled. The
current encoder has about 20 lines of code to make this decision,
though it's pretty reasonable.

(5) Short vs long blocks. We have pretty dumb code for this, but its
reasonably well tuned dumb code. It doesn't have to be great because
the next item hides some T/F sins.

(6) Per band time/frequency resolution trade-off. This has had a fair
bit of tuning behind it. The actual decision uses viterbi.

...and thats it. Other than that there are some places where the
encoder could intentionally distort the signal for better results,
e.g. at very low rates, but we don't really do much of this today.
Not too much that you could do worse, and a fair number of things that
someone could do better if they try.

There isn't even much that can be R/D biased because much of the data
is encoded with a uniform PDF since the various transformations have
put the data into a state where a uniform PDF is ~optimal, and there
is fairly little inter-symbol crosstalk in most of the format.

The reason that our encoder isn't doing better is because our
attention has been on the format? fancier psy directed decisions would
add a lot of code which isn't useful on very slow encoders and which
doesn't help us finish the format.

There are lots of ways to make the decoder fail completely though. The
decoder must make hundreds of decisions exactly identically to the
encoder or suffer complete frame loss.

On Thu, Nov 18, 2010 at 8:49 AM, Stefan de Konink <stefan at konink.de> wrote:
> The general clue that is given on the CELT development mailinglist is that
> SILK and CELT both get integrated in the IETF Opus/Harmony project as 'best
> of both worlds'. I doubt there is any taking over the world involved for
> CELT. Personally I was thinking exactly the same (regarding the taking over
> the world part).

Yes, though opus is a strict superset of CELT with the exception a
reduction in the flexibility of supported sample rates and frame
sizes.