[MPlayer-dev-eng] [RFC] preliminary x264 encoding help text

Fri Apr 29 17:26:24 CEST 2005

On Fri, Apr 29, 2005 at 04:32:22PM +0200, Guillaume POIRIER wrote:

> Send whatever is ready, and I'll try to find the time to include it.

OK, attached. While we're waiting for it to be finished, here are a
couple of interesting things to discuss:

I've found clips where the global PSNR optimal ip_factor is around 2.0,
with an optimal pb_factor of 1.6 or so. The optimal settings were found
by searching the option space for ip_factor in 0.1 intervals, then doing
the same for pb_factor (this didn't take long since the clips were CIF
and only a few minutes long). Does anyone else think these numbers seem
ridiculously high?

Also, about chroma_qp_offset. In every case I've tried, -1 is the
best value for global PSNR, but the difference isn't too big. Did
the JVT make a mistake in choosing the characteristics of the chroma
quantization table? More likely is that there was some psychovisual
reason for choosing a table that slightly disfavors chroma, relative to
the PSNR-optimal table. So I wonder what should be recommended to users,
or if it's even worth bringing up.
-------------- next part --------------
     What is x264?

x264 is a library for creating H.264 video streams. It is not 100%
complete, but currently it has at least some kind of support for most of
the H.264 features which impact quality. There are also many advanced
features in the H.264 specification which have nothing to do with video
quality per se; many of these are not yet implemented in x264.

     What is H.264?

H.264 is one name for a new digital video codec jointly developed by
the ITU and MPEG. It can also be correctly referred to by the cumbersome
names of "ISO/IEC 14496-10" or "MPEG-4 Part 10." More frequently, it is
referred to as "MPEG-4 AVC" or just "AVC," but I happen to dislike this
nomenclature because I consider it to be a stupid name in the same way
that "ASF" and "AAC" are stupid names (AVC stands for "Advanced Video
Coding".)

Whatever you call it, H.264 may be worth trying because it can typically
match the quality of MPEG-4 ASP with 5%-30% less bitrate. Actual results
will depend on both the source material and the encoder. The gains
from using H.264 don't come for free: decoding H.264 streams seems to
have steep CPU and memory requirements. On my CPU (1733 MHz Athlon),
a 1500kbps H.264 video uses around 50% CPU to decode. By comparison,
decoding a 1500kbps MPEG4-ASP stream requires around 10% CPU. This means
that decoding high-definition streams is almost out of the question
for most users. It also means that even a decent DVD rip may sometimes
stutter on processors slower than 2.0 GHz or so.

At least with x264, encoding requirements aren't much worse than what
you're used to with MPEG4-ASP. For me, typical DVD encodes run at 5-15fps.

This document is not intended to explain the details of H.264,
but if you are interested in a brief overview, you may want to read
http://www.fastvdo.com/spie04/spie04-h264OverviewPaper.pdf

     How can I play back H.264 videos with MPlayer?

MPlayer uses libavcodec's H.264 decoder. libavcodec has had at least
minimally usable H.264 decoding since around July 2004, however major
changes and improvements have been implemented since that time. Just to
be certain, it's always a good idea to use a recent cvs checkout.

If you want a quick and easy way to know whether there have been recent
changes to libavcodec's H.264 decoding, you might keep an eye on this URL:
http://mplayerhq.hu/cgi-bin/cvsweb.cgi/ffmpeg/libavcodec/h264.c?cvsroot=FFMpeg

     How can I encode videos using mencoder and x264?

If you have the subversion client installed, the latest x264 sources
can be gotten with this command:

svn co svn://svn.videolan.org/x264/trunk x264

MPlayer sources are updated whenever an x264 API change occurs, so it
is always suggested to use CVS MPlayer as well. Perhaps this situation
will change when and if an x264 "release" occurs. Meanwhile, x264 should
be considered very unstable, in the sense that its programming interface
is subject to change.

x264 is built and installed in the standard way: 
./configure && make && sudo make install
This installs libx264.a in /usr/local/lib and x264.h is placed in
/usr/local/include.

With the x264 library and header placed in the standard locations,
building MPlayer with x264 support is easy. Just run the standard
"./configure && make && sudo make install". The configure script will
autodetect that you have satisfied the requirements for x264.

     What options should I use to get the best results?

The x264 section of MPlayer's man page is excellent; please begin
by reviewing it. This document is intended to be a supplement to the
man page.

There are mainly three types of considerations when choosing encoding
options:

1) Trading off encoding time vs. quality
2) Frame type decision options
3) Ratecontrol and quantization decision options

This guide is mostly concerned with the first class of options. The
other two types often have more to do with personal preferences and
individual requirements.

Before continuing, I should explain that I'm using only one quality
metric: global PSNR. For a brief explanation of what PSNR is, see
http://en.wikipedia.org/wiki/PSNR
Global PSNR is the last PSNR number reported when you include the "psnr"
option in x264encopts. Any time I make a claim about PSNR, one of the
assumptions behind the claim is that equal bitrates are used.

Nearly all of my comments assume you're using 2-pass. When comparing
options, there are two major reasons for using 2-pass encoding. First,
using 2-pass often gains around 1dB PSNR, which is a very big
difference. Secondly, testing options by doing direct quality comparisons
with 1-pass encodes is a dubious proposition because bitrate often
varies significantly with each encode. It's not always easy to tell
whether quality changes are due mainly to changed options, or if they
mostly reflect differences in the achieved bitrate.

Of the options which allow you to trade off speed for quality, subq and
frameref are usually by far the most important. If you are interested
in tweaking either speed or quality, these are the first options you
should consider.

On the speed dimension, the frameref and subq options interact with
each other fairly strongly. In my experience, with one reference frame,
subq=5 takes about 35% more time than subq=1. With 6 reference frames,
the penalty grows to over 60%. subq's effect on PSNR seems fairly constant
regardless of the number of reference frames. I find that typically,
subq=5 gains 0.2-0.5 dB global PSNR over subq=1. This is usually enough
to be visible.

frameref is set to 1 by default, but this shouldn't be taken to imply
that it's reasonable to set it to 1. Merely raising frameref to 2 gains
around 0.15dB PSNR with a 5-10% speed penalty; this seems like a good
tradeoff. frameref=3 gains around 0.25dB PSNR over frameref=1, which
should be a visible difference. frameref=3 is around 15% slower than
frameref=1. Unfortunately, diminishing returns set in rapidly. frameref=6
can be expected to gain only 0.05-0.1 dB over frameref=3, at an additional
15% speed penalty. Above frameref=6, the quality gains are usually very
small (although you should keep in mind throughout this whole discussion
that it can vary quite a lot depending on your source). In a fairly
typical case, I find frameref=12 gains a tiny 0.02dB global PSNR over
frameref=6, at a speed cost of 15%-20%. At such high frameref values,
the only really good thing that can be said is that increasing even
further will almost certainly never *harm* psnr, but the additional
quality benefits are barely even measurable, let alone perceptible.

     (Note: Raising frameref to unnecessarily high values *can* and
     *usually does* hurt coding efficiency if you turn CABAC off. With
     CABAC on (the default behavior), the possibility of setting frameref
     "too high" currently seems too remote to even worry about, and in
     the future, optimizations may remove the possibility altogether).

If you care about speed, a reasonable compromise is to use low subq and
frameref values on the first pass, and then raise them on the second
pass. Typically, this has a negligible negative effect on the final
quality: you will probably lose well under 0.1dB PSNR, which should 
be much too small of a difference to see. However, different values of
frameref can occasionally affect frametype decision. Most likely, these
are rare outlying cases, but if you want to be pretty sure, consider
whether your video has either full-screen repetitive flashing patterns
or very large temporary occlusions which might force an I-frame. Adjust
the first-pass frameref so it is large enough to contain the duration of
the flashing cycle (or occlusion). For example, if the scene flashes
back and forth between two images over a duration of three frames, set
the first pass frameref to 3 or higher. This issue is probably extremely
rare in live action video material, but it does sometimes come up in
video game captures.

bframes:

The usefulness of B-frames is questionable in most other codecs you
may be used to. In H.264, this has changed: there are new techniques
and block types that are possible in B-frames. Usually, even a naive
bframe choice algorithm can have a significant PSNR benefit. It is also
interesting to note that if you turn off the adaptive b-frame decision
(nob_adapt), encoding with bframes usually speeds up encoding speed
somewhat.

With adaptive B-frame decision turned off (x264encopts nob_adapt),
the optimal value for this setting will usually range from bframes=1
to bframes=3. With adaptive B-frame decision on (the default behavior),
it's probably safe to use higher values; the encoder will try to reduce
the use of B-frames in scenes where they would hurt compression.

If you're going to use bframes at all, consider setting the maximum
number of bframes to 2 or higher in order to take advantage of weighted
prediction.

b_adapt:

Note: this is on by default.

With this option enabled, the encoder will use some simple heuristics
to reduce the number of bframes used in scenes that might not benefit
from them as much. You can use b_bias to tweak how b-frame-happy the
encoder is. The speed penalty of adaptive bframes is currently rather
modest, but so is the potential quality gain. It usually doesn't hurt,
however. Note that this only affects speed and frametype decision on
the first pass. b_adapt and b_bias have no effect on subsequent passes.

b_pyramid:

You might as well enable this option if you're using >2 bframes; as the
man page says, you get a little quality improvement with no speed cost.
Note that these videos can't be read by libavcodec-based decoders older
than about March 12, 2005.

weight_b:

In typical cases, there isn't much gain with this option.  However, in
crossfades or fade-to-black scenes, weighted prediction gives rather large
bitrate savings. In MPEG-4 ASP, a fade-to-black is usually best coded as a
series of expensive I-frames; using weighted prediction in bframes makes
it possible to turn at least some of these into much more reasonably-sized
B-frames. Encoding time cost seems to be minimal, if there is any. Also,
contrary to what some people seem to guess, the decoder CPU requirements
aren't much affected by weighted prediction, all else being equal.

Unfortunately, the current adaptive B-frame decision algorithm has a
strong tendency to avoid B-frames during fades. Until this changes, it
may be a good idea to add nob_adapt to your x264encopts, if you expect
fades to have a significant effect in your particular video clip.

deblockalpha, deblockbeta:

This topic is going to be a bit controversial.

H.264 defines a simple deblocking procedure on I-blocks that uses
pre-set strengths and thresholds depending on the QP of the block in
question. By default, high QP blocks are filtered heavily, and low
QP blocks are not deblocked at all. The pre-set strengths defined by
the standard are well-chosen and the odds are very good that they're
PSNR-optimal for whatever video you're trying to encode. The deblockalpha
and deblockbeta parameters allow you to specify offsets to the preset
deblocking thresholds.

Many people seem to think it's a good idea to lower the deblocking filter
strength by large amounts (say, -3). I think this is almost never a good
idea, and in my experience, people who are doing this don't understand
very well how deblocking works by default.

The first and most important thing to know about the in-loop deblocking
filter is that the default thresholds are almost always PSNR-optimal. In
the rare cases that they aren't optimal, the ideal offset is plus or
minus 1. Adjusting deblocking parameters by a larger amount is almost
guaranteed to hurt PSNR. Strengthening the filter will smear more details;
weakening the filter will increase the appearance of blockiness.

It's definitely a bad idea to lower the deblocking thresholds if your
source is mainly low in spacial complexity (i.e., not a lot of detail
or noise). The in-loop filter does a rather excellent job of concealing
the artifacts that occur. If the source is high in spacial complexity,
however, artifacts are less noticeable. This is because the ringing tends
to look like detail or noise. Human visual perception easily notices
when detail is removed, but it doesn't so easily notice when the noise is
wrongly represented. When it comes to subjective quality, noise and detail
are somewhat interchangeable. By lowering the deblocking filter strength,
you are most likely increasing error by adding ringing artifacts, but
the eye doesn't notice because it confuses the artifacts with detail.

This STILL doesn't justify lowering the deblocking filter
strength, however. You can generally get better quality noise from
postprocessing. If your H.264 encodes look too blurry or smeared,
try playing with -vf noise. "-vf noise=8a:4a" should conceal most mild
artifacting. It will almost certainly look better than the results you
would have gotten just by fiddling with the deblocking filter.