[FFmpeg-devel] FFV1 Specification

Sat Apr 7 03:14:45 CEST 2012

On Sat, 7 Apr 2012 07:26:47 Michael Niedermayer wrote:
> On Fri, Mar 30, 2012 at 11:53:58AM +0200, Michael Niedermayer wrote:
> > Hi
> > 
> > Just wanted to announce that ive moved the ffv1 spec to github and
> > i am working on cleaning it up and updating it to match the existing
> > implementation.
> > 
> > see: https://github.com/FFmpeg/FFV1
> > 
> > patches, pull requests and comments are like always, welcome
> 
> latest draft at github and at:
> http://ffmpeg.org/~michael/ffv1-draft/ffv1.html
> 
> If someone could read through it and point out where its unclear or
> incomplete, that would be very helpfull!
> I imagine i can easyly miss incompletenesses given that i know the
> codec pretty well ...
> 
> Also spellcheck/grammer/formating tips are welcome too!
> 
> [...]

Michael,

Comments re spelling/grammar/style (I'll leave the technical review to others 
who know what they're talking about). :-)

Section 3:

>"In the case of the JPEG2000-RCT colorspace the lines are interleaved to 
>reduce cache trashing as most likely the RCT will be immedeatly converted to 
>RGB during decoding, the order of the lines in the interleaving is again 
>Y,Cb,Cr. "

In the case of the JPEG2000-RCT colorspace the lines are interleaved to reduce 
cache trashing since it is most likely that the RCT will immediately be 
converted to RGB during decoding; the interleaved coding order is also 
Y,Cb,Cr.

[Not sure about "cache trashing" - sounds too "colloquial" for a technical 
document - is there a better way to say this? Perhaps, "to improve caching 
efficiency"?]

>Samples within a plane are coded in raster scan order (left->right, top-
>bottom), each sample is predicted by the median predictor from samples in the 
same plane and the difference is stored 

s/bottom), each/bottom). Each/ OR s/bottom), each/bottom); each/

s/stored/stored./ (Apparently missing full-stops in many other places, too). 

Is this sentence incomplete? How is the difference stored? 

Section 3.1:

> For the purpose of the predictior and context samples above the coded 
picture are assumed to be 0, right of the coded picture are identical to the 
closest left sample. And left of the coded picture are identical to the top 
right one if such exist or 0. 

s/0, right/0; samples to the right/

s/left sample. And/left sample; samples to the left/

s/top right one if such exist or 0/top right sample (if there is one), 
otherwise 0./

Section 3.6:

>Instead of coding the n+1 (or n+2 in the case of RCT) bits of the sample 
difference with huffman or range coding only the n (or n+1) least significant 
bits are used as thats enough the recover the original sample. bits in the 
equation below is bits_per_raw_sample+1 for RCT and bits_per_raw_sample 
otherwise. 

Instead of coding the n+1 bits of the sample difference with huffman or range 
coding (or n+2 bits, in the case of RCT), only the n (or n+1) least 
significant bits are used, since this is sufficient to recover the original 
sample. 

In the equation below, bits represents bits_per_raw_sample+1 for RCT or 
bits_per_raw_sample otherwise.

3.6.1:

s/H.264[2]. But/H.264[2], but/

s/situation as well as its slightly worse performance CABAC/situation (as well 
as its slightly worse performance) CABAC/

Non binary values:

s/integers, we could simply encode/ integers it would be possible to encode/

s/context, but /context, however/ OR s/context, but/context but/ (I like the 
first option better). 

s/symbol which is not only a waste of memory but also requires more past 
data/symbol which requires both more memory and more past data

s/reasonable/a reasonably/

s/Alternatively simply assuming/Alternatively, assuming/

s/mean like we do in huffman coding mode would be another possibility/mean (as 
in huffman coding) would also be possible/

s/but due to flexibility and simplicity, another method was chosen, which 
simply/ however, for maximum flexibility and simplicity, the chosen method/

s/mantisse and sign, the exact contexts which are used/mantissa and sign. The 
exact contexts used/

s/can probably better be described by the following code then by some english 
text/are best described by the following code, followed by some comments./

3.6.2:

Need definitions in the definitions/glossary section for VLC and ESC (and if 
we are to be pedantic MSB and any other acronyms used, unless they are 
considered to be so commonly in use among the target audience as to be 
completely unambiguous - MSB may well fall into this category).

Fix spacing between Suffix/non ESC and non ESC/Examples.

run mode/run length coding/level coding - capitalisation? 

 s/mode, and/mode and/

s/difference, on/ difference. On/

s/improved the compression rate a bit/slightly improved the compression rate./ 
(unless you meant literally "one bit"). 

 4.2 Header:

>version 0 or 1 
>coder_type Coder used, 0 (Golomb Rice), 1 (Range coder), 2 (Range coder with 
custom state transition table) 
>state_transition_delta The range coder custom state transition table. If it 
is not coded, all its elements are assumed to be 0. 
>colorspace_type 0 (YCbCr), 1 (JPEG2000_RCT) 
>chroma_planes 1 for color, 0 for grayscale 
>bits_per_raw_sample The number of bits for each sample, commonly 8, 9, 10 or 
16 
>h_chroma_subsample The subsample factor between luma and chroma width 
(chroma_width = 2 − log2_h_chroma_subsampleluma_width) 
>v_chroma_subsample The subsample factor between luma and chroma height 
(chroma_height = 2 − log2_v_chroma_subsampleluma_height) 
>alpha_plane 1 if a transparency plane is stored, 0 otherwise 

Need delimiters between value names and descriptions. Might be better in a 
table.

4.3 Quant Table

s/are simply stored/are stored/ 

s/described in 2↑, the/described in 2↑. The/

-- 
==========================================================================
Rodney Baker VK5ZTV
rodney.baker at iinet.net.au
==========================================================================