[FFmpeg-user] The future of video

Oliver Fromme oliver at fromme.com
Wed Jul 17 19:30:12 EEST 2024


Mark Filipak wrote:
 > The weighting of Y Cr Cb are still based on CRT phosphors, not reality. HSV is reality.

No, that's wrong, HSV is not "reality".  I'm afraid it's much more
complicated than that.

RGB, YCbCr and HSV are just color models (sometimes also called color
spaces, but that's an ambiguous term, so I prefer to avoid it).  They
have nothing to do with reality or CRT phosphors or anything physical.
They don't specify how a color is to be reproduced on a screen.
Basically, they're just different ways to encode a color.

In order to actually put a certain color on a display device, you
need three more things:  the color primaries (also called primary
chromaticities; they define the color gamut), the matrix coefficients
(these define the relative weight of the color components), and the
transfer function (it defines the mapping between the color values
and the actual scene colors, which is usually non-linear).

Together, these parameters (try to) reproduce "reality".

Let's look at two typical examples.

Digital SDR video -- as used in HDTV, DVD and others -- is based on
the properties of old CRT screens, as you mentioned.  It typically
uses the color reproduction that is specified in ITU-R BT.709 (the
current version is BT.709-6).  Its color primaries are very similar
to old-fashioned PAL/NTSC and covers about the same color gamut as
the old BT.601 from 1982 (!) which is the very first non-proprietary
standard for digital video.  Its transfer function is a "gamma"
function; you probably have heard of it before.  (Actually it's not
*exactly* a gamma function, but close.)  These parameters are
suitable for displays with a peak luminance of 100 nits.

In recent years, HDR and WCG video has become more and more common.
It is able to reproduce colors beyond what the old standards allow.
Now ITU-R BT.2020 and BT.2100 come into play.  They define a much
wider color gamut, and BT.2100 specifies two HDR transfer functions
called PQ (perceptual quantizer, a.k.a. SMPTE TS 2084, this is the
standard for Dolby Vision, HDR10 and HDR10+) and HLG (hybrid log
gamma, used for HDR TV broadcast).  PQ is capable of driving modern
displays with peak luminance of up to 10,000 nits.

Whether you use RGB or YCbCr (or even HSV if you're so inclined)
doesn't matter for color reproduction.  You can convert from one
to the other without loss of information (provided you have
sufficient precision).  Actually, BT.2100 allows both RGB and
YCbCr.  As explained earlier, YCbCr is much more efficient for
purposes of compression, so this is what is being used for the
vast majority of video.  Not RGB, and not HSV.

 > Slices with motion vectors are exactly 'software' sprites. Slices are a compromise based on the 
 > necessity of having frames. Frames will be obsolete some day. Video will be 'painted' on displays on 
 > a pel-by-pel, 'need to refresh' basis, not on a frame rate schedule. It is then that compression 
 > will really take off. People will be amazed by the amount of compression achieved, and using so 
 > little bandwidth.

That doesn't make much sense, in my opinion.  You will still need
a time base and PTS values.  Sensors of cameras deliver video as a
sequence of frames with a certain frame rate.  That's how things
work, and I don't see a good reason for changing that (and how,
from a technical point of view).

Compression standards like H.264 and its successors already store
only the things that change from frame to frame.  If a part of a
frame didn't change from the previous frame, it's not stored
again.  Only parts that "need to refresh", as you put it, are
actually stored in the MPEG video stream.

For example, when you transcode a 25 fps video to 50 fps by simply
duplicating every frame, then the resulting video will compress
to almost the same size (at identical quality settings), because
the duplicated frames take almost zero space.  Similarly, when you
encode a slide show to a video where 99% of frames are identical,
it'll compress down to a tiny file that isn't much larger than the
individual source images of the slide show.

I don't see how obsoleting frames will improve things.
Actually, I don't even see what "obsoleting frames" really means,
because you'll still need a time base, and basically that's what
a frame is: the state of the screen contents at a given time.

Best regards
 -- Oliver


More information about the ffmpeg-user mailing list