[MPlayer-dev-eng] [RFC] preliminary x264 encoding help text

Sun Apr 10 01:00:10 CEST 2005

Hi, I'm doing some preliminary work on a document that will provide
encoding tips for users of x264. Here are my ideas for topics that
should be addressed:

1) Intro stuff
2) Getting x264 and getting it to work with mencoder
3) Suggestions about what options to use, and why
4) muxing issues (what container format to use, and why)

In the attached document, I think I have items 1 and 2 fairly well
covered. 3 is partly covered. I'm disappointed by how long this section
is, but I'm also unhappy with how incomplete it is. Frankly I'm beginning
to think my rant against the deblocking filter tinkerers might belong
elsewhere. As for section 4, I don't feel very commited to it and I would
appreciate it if someone else would offer to address it.

I know there are some people around here who have done a lot of benchmarks
that I haven't, which is why my write-up below only covers the frameref
and subq options in any kind of serious detail. I would be grateful if
someone could submit more benchmarks involving the partition-related
options and the bframe related options. And a special prize is deserved
by anyone who can show a PSNR gain by adjusting the deblocking parameters
by -2 or more.

When this document is in decent shape, I will adapt it for inclusion
in DOCS/xml/en/mencoder.xml (which, btw, rocks, and should probably
be referenced multiple times in the man page). Meanwhile I'd be very
grateful for any comments or contributions.

And on a semi-related note, can anyone give a convincing justification for
why the man page still states that increasing frameref is more effective
on anime vs. live material? I fear this comment may have its origins in
some poorly-considered and half-assed tests I did way back last fall.
-------------- next part --------------
     What is x264?

x264 is a library for creating H.264 video streams. It is not 100%
complete, but currently it has at least some kind of support for most of
the H.264 features which impact quality. There are also many advanced
features in the H.264 specification which have nothing to do with video
quality per se; many of these are not yet implemented in x264.

     What is H.264?

H.264 is one name for a new digital video codec jointly developed by
the ITU and MPEG. It can also be correctly referred to by the cumbersome
names of "ISO/IEC 14496-10" or "MPEG-4 Part 10." More frequently, it is
referred to as "MPEG-4 AVC" or just "AVC," but I happen to dislike this
nomenclature because I consider it to be a stupid name in the same way
that "ASF" and "AAC" are stupid names (AVC stands for "Advanced Video
Coding"). So far, I'm on the losing side of this battle.

Whatever you call it, H.264 may be worth trying because it can typically
match the quality of MPEG-4 ASP with 5%-30% less bitrate. Actual results
will depend on both the source material and the encoder. The gains
from using H.264 don't come for free: decoding H.264 streams seems to
have steep CPU and memory requirements. On my CPU (1733 MHz Athlon),
a 1500kbps H.264 video uses around 50% CPU to decode. By comparison,
decoding a 1500kbps MPEG4-ASP stream requires around 10% CPU. This means
that decoding high-definition streams is out of the question for users of
software decoders. It also means that even a decent DVD rip may sometimes
stutter on processors slower than 2.0 GHz or so.

At least with x264, encoding requirements aren't much worse than what
you're used to with MPEG4-ASP. For me, typical DVD encodes run at 5-15fps.

This document is not intended to explain the details of H.264,
but if you are interested in a brief overview, you may want to read
http://www.fastvdo.com/spie04/spie04-h264OverviewPaper.pdf

     How can I play back H.264 videos with MPlayer?

MPlayer uses libavcodec's H.264 decoder. libavcodec has had at least
minimally usable H.264 decoding since around July 2004, but major
changes and improvements have been implemented since that time. Just to
be certain, it's always a good idea to use a recent cvs checkout.

If you want a quick and easy way to know whether there have been recent
changes to libavcodec's H.264 decoding, you might keep an eye on this URL:
http://mplayerhq.hu/cgi-bin/cvsweb.cgi/ffmpeg/libavcodec/h264.c?cvsroot=FFMpeg

     How can I encode videos using mencoder and x264?

If you have the subversion client installed, the latest x264 sources
can be gotten with this command:

svn co svn://svn.videolan.org/x264/trunk x264

MPlayer sources are updated whenever an x264 API change occurs, so it
is always suggested to use CVS MPlayer as well. Perhaps this situation
will change when and if an x264 "release" occurs.

x264 comes with no configure script. To build, just type "make" in the
source tree. Once this is done, you can run "make install" as root,
which will install the library libx264.a in /usr/local/lib and will
copy x264.h into /usr/local/include. If you don't like these install
locations, you can edit the Makefile - just like in the good old days.

With the x264 library and header placed in the standard locations,
building MPlayer with x264 support is easy. Just run the standard
"./configure && make && sudo make install". The configure script will
autodetect that you have satisfied the requirements for x264.

     What options should I use to get the best results?

The x264 section of MPlayer's man page is excellent; please begin
by reviewing it. This document is intended to be a supplement to the
man page.

There are mainly three types of considerations when choosing encoding
options:

1) Trading off encoding time vs. quality
2) frame type decision options
3) ratecontrol and quantization decision options

This guide is mostly concerned with the first class of considerations. The
other two types often have more to do with personal preferences and
individual requirements.

Before continuing, I should explain that I'm using only one quality
metric: global PSNR. This is reported as the last PSNR number reported
when you include the "psnr" option in x264encopts. Any time I make a
claim about PSNR, one of the assumptions behind the claim is that equal
bitrates are used.

Nearly all of my comments assume you're using 2-pass. When comparing
options, there are two major reasons for using 2-pass encoding. First,
using 2-pass usually gains around 1.5db PSNR, which is a very big
difference. Secondly, testing options by doing direct quality comparisons
with 1-pass encodes is a dubious proposition because bitrate usually
varies significantly with each encode. It's not always easy to tell
whether quality changes are due mainly to changed options, or if they
mostly reflect differences in the achieved bitrate.

Of the options which allow you to trade off speed for quality, subq and
frameref are usually by far the most important. If you are interested
in tweaking either speed or quality, these are the first options you
should consider.

On the speed dimension, the frameref and subq options interact with
each other fairly strongly. In my experience, with one reference frame,
subq=5 takes about 35% more time than subq=1. With 6 reference frames,
the penalty grows to over 60%. subq's effect on PSNR seems fairly constant
regardless of the number of reference frames. I find that typically,
subq=5 gains 0.2-0.5 db global PSNR over subq=1. This is usually enough
to be visible.

frameref is set to 1 by default, but this shouldn't be taken to imply
that it's reasonable to set it to 1. Merely raising frameref to 2 gains
around 0.15db PSNR with a 5-10% speed penalty; this seems like a good
tradeoff. frameref=3 gains around 0.25db PSNR over frameref=1, which
usually is a perceptible difference. frameref=3 is around 15% slower than
frameref=1. Unfortunately, diminishing returns set in rapidly. frameref=6
can be expected to gain only 0.05-0.1 db over frameref=3, at an additional
15% speed penalty. Above frameref=6, the quality gains are usually very
small (although you should keep in mind throughout this whole discussion
that it can vary quite a lot depending on your source). In a fairly
typical case, I find frameref=12 gains a tiny 0.02db global PSNR over
frameref=6, at a speed cost of 15%-20%. At such high frameref values,
the only really good thing that can be said is that increasing even
further will almost certainly never *harm* psnr, but the additional
quality benefits are barely even measurable, let alone perceptible.

     (Note: Raising frameref to unnecessarily high values *can* and
     *usually does* hurt coding efficiency if you turn CABAC off. With
     CABAC on (the default behavior), the possibility of setting frameref
     "too high" currently seems too remote to even worry about, and in
     the future, optimizations may remove the possibility altogether).

If you care about speed, a reasonable compromise is to use low subq and
frameref values on the first pass, and then raise them on the second
pass. Typically, this has a negligible negative effect on the final
quality: you will probably lose well under 0.1db PSNR, which should 
be much too small of a difference to see. However, different values of
frameref can occasionally affect frametype decision. Most likely, these
are rare outlying cases, but if you want to be pretty sure, consider
whether your video has either full-screen repetitive flashing patterns
or very large temporary occlusions which might force an I-frame. Adjust
the first-pass frameref so it is large enough to contain the duration of
the flashing cycle (or occlusion). For example, if the scene flashes
back and forth between two images over a duration of three frames, set
the first pass frameref to 3 or higher. This issue is probably extremely
rare in live action video material, but it does sometimes come up in
video game captures.

bframes:

The usefulness of b-frames is questionable in most other codecs you
may be used to. In h.264, this has changed: there are new techniques
and block types that are possible in b-frames. Usually, even a naive
bframe choice algorithm can have a significant PSNR benefit. It is also
interesting to note that if you turn off the adaptive b-frame decision
(nob_adapt), encoding with bframes usually speeds up encoding speed
somewhat. *** REAL WORLD NUMBERS MIGHT BE NICE ***

If you're going to use bframes at all, consider setting the maximum
number of bframes to 2 or higher in order to take advantage of weighted
prediction.

b_adapt:

Note: this is on by default.

With this option enabled, the encoder will use some simple heuristics
to reduce the number of bframes used in scenes that might not benefit
from them as much. You can use b_bias to tweak how b-frame-happy the
encoder is. The speed penalty of adaptive bframes is currently rather
modest, but so is the potential quality gain. It usually doesn't hurt,
however. Note that this only affects speed and frametype decision on
the first pass. b_adapt and b_bias have no effect on subsequent passes.

b_pyramid:

You might as well enable this option if you're using >2 bframes; as the
man page says, you get a little quality improvement with no speed cost.
Note that these videos can't be read by libavcodec-based decoders older
than about March 12, 2005.

weight_b:

In typical cases, there isn't much gain with this option. *** EXAMPLE
RESULTS PLEASE *** However, in crossfades or fade-to-black scenes,
weighted prediction gives rather large bitrate savings. In mpeg4 asp,
a fade-to-black is usually best coded as a series of expensive I-frames;
using weighted prediction in bframes makes it possible to turn at least
some of these into much more reasonably-sized b-frames. Encoding time
cost seems to be minimal, if there is any. Also, contrary to what some
people seem to guess, the decoder CPU requirements aren't much affected
by weighted prediction, all else being equal. *** ANYONE HAVE NUMBERS? ***

deblockalpha, deblockbeta:

This topic is going to be a bit controversial.

H.264 defines a simple deblocking procedure on I-blocks that uses
pre-set strengths and thresholds depending on the QP of the block in
question. By default, high QP blocks are filtered heavily, and low
QP blocks are not deblocked at all. The pre-set strengths defined by
the standard are well-chosen and the odds are very good that they're
PSNR-optimal for whatever video you're trying to encode. The deblockalpha
and deblockbeta parameters allow you to specify offsets to the preset
deblocking thresholds.

Many people seem to think it's a good idea to lower the deblocking filter
strength by large amounts (say, -3). I think this is almost never a good
idea, and in my experience, people who are doing this don't understand
very well how deblocking works by default.

It's definitely a bad idea to lower the deblocking thresholds if your
source is mainly low in spacial complexity (i.e., not a lot of detail
or noise). The in-loop filter does a rather excellent job of masking
the ringing and blocking artifacts that occur. If the source is high in
spacial complexity, however, ringing artifacts are less noticeable. This
is because the ringing tends to look like detail or noise. Human visual
perception easily notices when detail is removed, but it doesn't so
easily notice when the noise is wrongly represented. This is because
noise and detail are somewhat interchangeable. In other words, by
lowering the deblocking filter strength, you are most likely increasing
error by adding ringing artifacts, but the eye doesn't notice because it
confuses the artifacts with detail. If you still don't get it, try this
explanation: quantization destroys some detail from the original picture,
but artifacts add new, different details back in.

This STILL doesn't justify lowering the deblocking filter
strength, however. You can generally get better quality noise from
postprocessing. If your h.264 encodes look too blurry or smeared,
try playing with -vf noise. "-vf noise=7a:3a" should conceal most mild
artifacting. It will almost certainly look better than the results you
would have gotten just by fiddling with the deblocking filter.

----

     Putting it all together

For me, a typical encode looks something like this:

mencoder video.avi -ovc x264 -x264encopts bitrate=XXX:weight_b:bframes=2:b_pyramid:b_adapt:keyint=600:frameref=3:subq=1:pass=1 -o finished.avi

mencoder video.avi -ovc x264 -x264encopts bitrate=XXX:weight_b:bframes=2:b_pyram
id:b_adapt:keyint=600:frameref=10:subq=5:pass=2 -o finished.avi