7.5. Encoding with the x264 codec

x264 is a free library for encoding H.264/AVC video streams. Before starting to encode, you need to set up MEncoder to support it.

7.5.1. Encoding options of x264

Please begin by reviewing the x264 section of MPlayer's man page. This section is intended to be a supplement to the man page. Here you will find quick hints about which options are most likely to interest most people. The man page is more terse, but also more exhaustive, and it sometimes offers much better technical detail. Introduction

This guide considers two major categories of encoding options:

  1. Options which mainly trade off encoding time vs. quality

  2. Options which may be useful for fulfilling various personal preferences and special requirements

Ultimately, only you can decide which options are best for your purposes. The decision for the first class of options is the simplest: you only have to decide whether you think the quality differences justify the speed differences. For the second class of options, preferences may be far more subjective, and more factors may be involved. Note that some of the "personal preferences and special requirements" options can still have large impacts on speed or quality, but that is not what they are primarily useful for. A couple of the "personal preference" options may even cause changes that look better to some people, but look worse to others.

Before continuing, you need to understand that this guide uses only one quality metric: global PSNR. For a brief explanation of what PSNR is, see the Wikipedia article on PSNR. Global PSNR is the last PSNR number reported when you include the psnr option in x264encopts. Any time you read a claim about PSNR, one of the assumptions behind the claim is that equal bitrates are used.

Nearly all of this guide's comments assume you are using two pass. When comparing options, there are two major reasons for using two pass encoding. First, using two pass often gains around 1dB PSNR, which is a very big difference. Secondly, testing options by doing direct quality comparisons with one pass encodes introduces a major confounding factor: bitrate often varies significantly with each encode. It is not always easy to tell whether quality changes are due mainly to changed options, or if they mostly reflect essentially random differences in the achieved bitrate. Options which primarily affect speed and quality

  • subq: Of the options which allow you to trade off speed for quality, subq and frameref (see below) are usually by far the most important. If you are interested in tweaking either speed or quality, these are the first options you should consider. On the speed dimension, the frameref and subq options interact with each other fairly strongly. Experience shows that, with one reference frame, subq=5 (the default setting) takes about 35% more time than subq=1. With 6 reference frames, the penalty grows to over 60%. subq's effect on PSNR seems fairly constant regardless of the number of reference frames. Typically, subq=5 achieves 0.2-0.5 dB higher global PSNR in comparison subq=1. This is usually enough to be visible.

    subq=6 is slower and yields better quality at a reasonable cost. In comparison to subq=5, it usually gains 0.1-0.4 dB global PSNR with speed costs varying from 25%-100%. Unlike other levels of subq, the behavior of subq=6 does not depend much on frameref and me. Instead, the effectiveness of subq=6 depends mostly upon the number of B-frames used. In normal usage, this means subq=6 has a large impact on both speed and quality in complex, high motion scenes, but it may not have much effect in low-motion scenes. Note that it is still recommended to always set bframes to something other than zero (see below).

    subq=7 is the slowest, highest quality mode. In comparison to subq=6, it usually gains 0.01-0.05 dB global PSNR with speed costs varying from 15%-33%. Since the tradeoff encoding time vs. quality is quite low, you should only use it if you are after every bit saving and if encoding time is not an issue.

  • frameref: frameref is set to 1 by default, but this should not be taken to imply that it is reasonable to set it to 1. Merely raising frameref to 2 gains around 0.15dB PSNR with a 5-10% speed penalty; this seems like a good tradeoff. frameref=3 gains around 0.25dB PSNR over frameref=1, which should be a visible difference. frameref=3 is around 15% slower than frameref=1. Unfortunately, diminishing returns set in rapidly. frameref=6 can be expected to gain only 0.05-0.1 dB over frameref=3 at an additional 15% speed penalty. Above frameref=6, the quality gains are usually very small (although you should keep in mind throughout this whole discussion that it can vary quite a lot depending on your source). In a fairly typical case, frameref=12 will improve global PSNR by a tiny 0.02dB over frameref=6, at a speed cost of 15%-20%. At such high frameref values, the only really good thing that can be said is that increasing it even further will almost certainly never harm PSNR, but the additional quality benefits are barely even measurable, let alone perceptible.


    Raising frameref to unnecessarily high values can and usually does hurt coding efficiency if you turn CABAC off. With CABAC on (the default behavior), the possibility of setting frameref "too high" currently seems too remote to even worry about, and in the future, optimizations may remove the possibility altogether.

    If you care about speed, a reasonable compromise is to use low subq and frameref values on the first pass, and then raise them on the second pass. Typically, this has a negligible negative effect on the final quality: You will probably lose well under 0.1dB PSNR, which should be much too small of a difference to see. However, different values of frameref can occasionally affect frame type decision. Most likely, these are rare outlying cases, but if you want to be pretty sure, consider whether your video has either fullscreen repetitive flashing patterns or very large temporary occlusions which might force an I-frame. Adjust the first-pass frameref so it is large enough to contain the duration of the flashing cycle (or occlusion). For example, if the scene flashes back and forth between two images over a duration of three frames, set the first pass frameref to 3 or higher. This issue is probably extremely rare in live action video material, but it does sometimes come up in video game captures.

  • me: This option is for choosing the motion estimation search method. Altering this option provides a straightforward quality-vs-speed tradeoff. me=dia is only a few percent faster than the default search, at a cost of under 0.1dB global PSNR. The default setting (me=hex) is a reasonable tradeoff between speed and quality. me=umh gains a little under 0.1dB global PSNR, with a speed penalty that varies depending on frameref. At high values of frameref (e.g. 12 or so), me=umh is about 40% slower than the default me=hex. With frameref=3, the speed penalty incurred drops to 25%-30%.

    me=esa uses an exhaustive search that is too slow for practical use.

  • partitions=all: This option enables the use of 8x4, 4x8 and 4x4 subpartitions in predicted macroblocks (in addition to the default partitions). Enabling it results in a fairly consistent 10%-15% loss of speed. This option is rather useless in source containing only low motion, however in some high-motion source, particularly source with lots of small moving objects, gains of about 0.1dB can be expected.

  • bframes: If you are used to encoding with other codecs, you may have found that B-frames are not always useful. In H.264, this has changed: there are new techniques and block types that are possible in B-frames. Usually, even a naive B-frame choice algorithm can have a significant PSNR benefit. It is interesting to note that using B-frames usually speeds up the second pass somewhat, and may also speed up a single pass encode if adaptive B-frame decision is turned off.

    With adaptive B-frame decision turned off (x264encopts's nob_adapt), the optimal value for this setting is usually no more than bframes=1, or else high-motion scenes can suffer. With adaptive B-frame decision on (the default behavior), it is safe to use higher values; the encoder will reduce the use of B-frames in scenes where they would hurt compression. The encoder rarely chooses to use more than 3 or 4 B-frames; setting this option any higher will have little effect.

  • b_adapt: Note: This is on by default.

    With this option enabled, the encoder will use a reasonably fast decision process to reduce the number of B-frames used in scenes that might not benefit from them as much. You can use b_bias to tweak how B-frame-happy the encoder is. The speed penalty of adaptive B-frames is currently rather modest, but so is the potential quality gain. It usually does not hurt, however. Note that this only affects speed and frame type decision on the first pass. b_adapt and b_bias have no effect on subsequent passes.

  • b_pyramid: You might as well enable this option if you are using >=2 B-frames; as the man page says, you get a little quality improvement at no speed cost. Note that these videos cannot be read by libavcodec-based decoders older than about March 5, 2005.

  • weight_b: In typical cases, there is not much gain with this option. However, in crossfades or fade-to-black scenes, weighted prediction gives rather large bitrate savings. In MPEG-4 ASP, a fade-to-black is usually best coded as a series of expensive I-frames; using weighted prediction in B-frames makes it possible to turn at least some of these into much smaller B-frames. Encoding time cost is minimal, as no extra decisions need to be made. Also, contrary to what some people seem to guess, the decoder CPU requirements are not much affected by weighted prediction, all else being equal.

    Unfortunately, the current adaptive B-frame decision algorithm has a strong tendency to avoid B-frames during fades. Until this changes, it may be a good idea to add nob_adapt to your x264encopts, if you expect fades to have a large effect in your particular video clip.

  • threads: This option allows to spawn threads to encode in parallel on multiple CPUs. You can manually select the number of threads to be created or, better, set threads=auto and let x264 detect how many CPUs are available and pick an appropriate number of threads. If you have a multi-processor machine, you should really consider using it as it can to increase encoding speed linearly with the number of CPU cores (about 94% per CPU core), with very little quality reduction (about 0.005dB for dual processor, about 0.01dB for a quad processor machine). Options pertaining to miscellaneous preferences

  • Two pass encoding: Above, it was suggested to always use two pass encoding, but there are still reasons for not using it. For instance, if you are capturing live TV and encoding in realtime, you are forced to use single-pass. Also, one pass is obviously faster than two passes; if you use the exact same set of options on both passes, two pass encoding is almost twice as slow.

    Still, there are very good reasons for using two pass encoding. For one thing, single pass ratecontrol is not psychic, and it often makes unreasonable choices because it cannot see the big picture. For example, suppose you have a two minute long video consisting of two distinct halves. The first half is a very high-motion scene lasting 60 seconds which, in isolation, requires about 2500kbps in order to look decent. Immediately following it is a much less demanding 60-second scene that looks good at 300kbps. Suppose you ask for 1400kbps on the theory that this is enough to accommodate both scenes. Single pass ratecontrol will make a couple of "mistakes" in such a case. First of all, it will target 1400kbps in both segments. The first segment may end up heavily overquantized, causing it to look unacceptably and unreasonably blocky. The second segment will be heavily underquantized; it may look perfect, but the bitrate cost of that perfection will be completely unreasonable. What is even harder to avoid is the problem at the transition between the two scenes. The first seconds of the low motion half will be hugely over-quantized, because the ratecontrol is still expecting the kind of bitrate requirements it met in the first half of the video. This "error period" of heavily over-quantized low motion will look jarringly bad, and will actually use less than the 300kbps it would have taken to make it look decent. There are ways to mitigate the pitfalls of single-pass encoding, but they may tend to increase bitrate misprediction.

    Multipass ratecontrol can offer huge advantages over a single pass. Using the statistics gathered from the first pass encode, the encoder can estimate, with reasonable accuracy, the "cost" (in bits) of encoding any given frame, at any given quantizer. This allows for a much more rational, better planned allocation of bits between the expensive (high-motion) and cheap (low-motion) scenes. See qcomp below for some ideas on how to tweak this allocation to your liking.

    Moreover, two passes need not take twice as long as one pass. You can tweak the options in the first pass for higher speed and lower quality. If you choose your options well, you can get a very fast first pass. The resulting quality in the second pass will be slightly lower because size prediction is less accurate, but the quality difference is normally much too small to be visible. Try, for example, adding subq=1:frameref=1 to the first pass x264encopts. Then, on the second pass, use slower, higher-quality options: subq=6:frameref=15:partitions=all:me=umh

  • Three pass encoding? x264 offers the ability to make an arbitrary number of consecutive passes. If you specify pass=1 on the first pass, then use pass=3 on a subsequent pass, the subsequent pass will both read the statistics from the previous pass, and write its own statistics. An additional pass following this one will have a very good base from which to make highly accurate predictions of frame sizes at a chosen quantizer. In practice, the overall quality gain from this is usually close to zero, and quite possibly a third pass will result in slightly worse global PSNR than the pass before it. In typical usage, three passes help if you get either bad bitrate prediction or bad looking scene transitions when using only two passes. This is somewhat likely to happen on extremely short clips. There are also a few special cases in which three (or more) passes are handy for advanced users, but for brevity, this guide omits discussing those special cases.

  • qcomp: qcomp trades off the number of bits allocated to "expensive" high-motion versus "cheap" low-motion frames. At one extreme, qcomp=0 aims for true constant bitrate. Typically this would make high-motion scenes look completely awful, while low-motion scenes would probably look absolutely perfect, but would also use many times more bitrate than they would need in order to look merely excellent. At the other extreme, qcomp=1 achieves nearly constant quantization parameter (QP). Constant QP does not look bad, but most people think it is more reasonable to shave some bitrate off of the extremely expensive scenes (where the loss of quality is not as noticeable) and reallocate it to the scenes that are easier to encode at excellent quality. qcomp is set to 0.6 by default, which may be slightly low for many peoples' taste (0.7-0.8 are also commonly used).

  • keyint: keyint is solely for trading off file seekability against coding efficiency. By default, keyint is set to 250. In 25fps material, this guarantees the ability to seek to within 10 seconds precision. If you think it would be important and useful to be able to seek within 5 seconds of precision, set keyint=125; this will hurt quality/bitrate slightly. If you care only about quality and not about seekability, you can set it to much higher values (understanding that there are diminishing returns which may become vanishingly low, or even zero). The video stream will still have seekable points as long as there are some scene changes.

  • deblock: This topic is going to be a bit controversial.

    H.264 defines a simple deblocking procedure on I-blocks that uses pre-set strengths and thresholds depending on the QP of the block in question. By default, high QP blocks are filtered heavily, and low QP blocks are not deblocked at all. The pre-set strengths defined by the standard are well-chosen and the odds are very good that they are PSNR-optimal for whatever video you are trying to encode. The deblock allow you to specify offsets to the preset deblocking thresholds.

    Many people seem to think it is a good idea to lower the deblocking filter strength by large amounts (say, -3). This is however almost never a good idea, and in most cases, people who are doing this do not understand very well how deblocking works by default.

    The first and most important thing to know about the in-loop deblocking filter is that the default thresholds are almost always PSNR-optimal. In the rare cases that they are not optimal, the ideal offset is plus or minus 1. Adjusting deblocking parameters by a larger amount is almost guaranteed to hurt PSNR. Strengthening the filter will smear more details; weakening the filter will increase the appearance of blockiness.

    It is definitely a bad idea to lower the deblocking thresholds if your source is mainly low in spacial complexity (i.e., not a lot of detail or noise). The in-loop filter does a rather excellent job of concealing the artifacts that occur. If the source is high in spacial complexity, however, artifacts are less noticeable. This is because the ringing tends to look like detail or noise. Human visual perception easily notices when detail is removed, but it does not so easily notice when the noise is wrongly represented. When it comes to subjective quality, noise and detail are somewhat interchangeable. By lowering the deblocking filter strength, you are most likely increasing error by adding ringing artifacts, but the eye does not notice because it confuses the artifacts with detail.

    This still does not justify lowering the deblocking filter strength, however. You can generally get better quality noise from postprocessing. If your H.264 encodes look too blurry or smeared, try playing with -vf noise when you play your encoded movie. -vf noise=8a:4a should conceal most mild artifacts. It will almost certainly look better than the results you would have gotten just by fiddling with the deblocking filter.

7.5.2. Encoding setting examples

The following settings are examples of different encoding option combinations that affect the speed vs quality tradeoff at the same target bitrate.

All the encoding settings were tested on a 720x448 @30000/1001 fps video sample, the target bitrate was 900kbps, and the machine was an AMD-64 3400+ at 2400 MHz in 64 bits mode. Each encoding setting features the measured encoding speed (in frames per second) and the PSNR loss (in dB) compared to the "very high quality" setting. Please understand that depending on your source, your machine type and development advancements, you may get very different results.

DescriptionEncoding optionsspeed (in fps)Relative PSNR loss (in dB)
Very high qualitysubq=6:partitions=all:8x8dct:me=umh:frameref=5:bframes=3:b_pyramid=normal:weight_b6fps0dB
High qualitysubq=5:8x8dct:frameref=2:bframes=3:b_pyramid=normal:weight_b13fps-0.89dB