[FFmpeg-devel] FFV1 Specification
Rodney Baker
rodney.baker at iinet.net.au
Sat Apr 7 03:14:45 CEST 2012
On Sat, 7 Apr 2012 07:26:47 Michael Niedermayer wrote:
> On Fri, Mar 30, 2012 at 11:53:58AM +0200, Michael Niedermayer wrote:
> > Hi
> >
> > Just wanted to announce that ive moved the ffv1 spec to github and
> > i am working on cleaning it up and updating it to match the existing
> > implementation.
> >
> > see: https://github.com/FFmpeg/FFV1
> >
> > patches, pull requests and comments are like always, welcome
>
> latest draft at github and at:
> http://ffmpeg.org/~michael/ffv1-draft/ffv1.html
>
> If someone could read through it and point out where its unclear or
> incomplete, that would be very helpfull!
> I imagine i can easyly miss incompletenesses given that i know the
> codec pretty well ...
>
> Also spellcheck/grammer/formating tips are welcome too!
>
> [...]
Michael,
Comments re spelling/grammar/style (I'll leave the technical review to others
who know what they're talking about). :-)
Section 3:
>"In the case of the JPEG2000-RCT colorspace the lines are interleaved to
>reduce cache trashing as most likely the RCT will be immedeatly converted to
>RGB during decoding, the order of the lines in the interleaving is again
>Y,Cb,Cr. "
In the case of the JPEG2000-RCT colorspace the lines are interleaved to reduce
cache trashing since it is most likely that the RCT will immediately be
converted to RGB during decoding; the interleaved coding order is also
Y,Cb,Cr.
[Not sure about "cache trashing" - sounds too "colloquial" for a technical
document - is there a better way to say this? Perhaps, "to improve caching
efficiency"?]
>Samples within a plane are coded in raster scan order (left->right, top-
>bottom), each sample is predicted by the median predictor from samples in the
same plane and the difference is stored
s/bottom), each/bottom). Each/ OR s/bottom), each/bottom); each/
s/stored/stored./ (Apparently missing full-stops in many other places, too).
Is this sentence incomplete? How is the difference stored?
Section 3.1:
> For the purpose of the predictior and context samples above the coded
picture are assumed to be 0, right of the coded picture are identical to the
closest left sample. And left of the coded picture are identical to the top
right one if such exist or 0.
s/0, right/0; samples to the right/
s/left sample. And/left sample; samples to the left/
s/top right one if such exist or 0/top right sample (if there is one),
otherwise 0./
Section 3.6:
>Instead of coding the n+1 (or n+2 in the case of RCT) bits of the sample
difference with huffman or range coding only the n (or n+1) least significant
bits are used as thats enough the recover the original sample. bits in the
equation below is bits_per_raw_sample+1 for RCT and bits_per_raw_sample
otherwise.
Instead of coding the n+1 bits of the sample difference with huffman or range
coding (or n+2 bits, in the case of RCT), only the n (or n+1) least
significant bits are used, since this is sufficient to recover the original
sample.
In the equation below, bits represents bits_per_raw_sample+1 for RCT or
bits_per_raw_sample otherwise.
3.6.1:
s/H.264[2]. But/H.264[2], but/
s/situation as well as its slightly worse performance CABAC/situation (as well
as its slightly worse performance) CABAC/
Non binary values:
s/integers, we could simply encode/ integers it would be possible to encode/
s/context, but /context, however/ OR s/context, but/context but/ (I like the
first option better).
s/symbol which is not only a waste of memory but also requires more past
data/symbol which requires both more memory and more past data
s/reasonable/a reasonably/
s/Alternatively simply assuming/Alternatively, assuming/
s/mean like we do in huffman coding mode would be another possibility/mean (as
in huffman coding) would also be possible/
s/but due to flexibility and simplicity, another method was chosen, which
simply/ however, for maximum flexibility and simplicity, the chosen method/
s/mantisse and sign, the exact contexts which are used/mantissa and sign. The
exact contexts used/
s/can probably better be described by the following code then by some english
text/are best described by the following code, followed by some comments./
3.6.2:
Need definitions in the definitions/glossary section for VLC and ESC (and if
we are to be pedantic MSB and any other acronyms used, unless they are
considered to be so commonly in use among the target audience as to be
completely unambiguous - MSB may well fall into this category).
Fix spacing between Suffix/non ESC and non ESC/Examples.
run mode/run length coding/level coding - capitalisation?
s/mode, and/mode and/
s/difference, on/ difference. On/
s/improved the compression rate a bit/slightly improved the compression rate./
(unless you meant literally "one bit").
4.2 Header:
>version 0 or 1
>coder_type Coder used, 0 (Golomb Rice), 1 (Range coder), 2 (Range coder with
custom state transition table)
>state_transition_delta The range coder custom state transition table. If it
is not coded, all its elements are assumed to be 0.
>colorspace_type 0 (YCbCr), 1 (JPEG2000_RCT)
>chroma_planes 1 for color, 0 for grayscale
>bits_per_raw_sample The number of bits for each sample, commonly 8, 9, 10 or
16
>h_chroma_subsample The subsample factor between luma and chroma width
(chroma_width = 2 − log2_h_chroma_subsampleluma_width)
>v_chroma_subsample The subsample factor between luma and chroma height
(chroma_height = 2 − log2_v_chroma_subsampleluma_height)
>alpha_plane 1 if a transparency plane is stored, 0 otherwise
Need delimiters between value names and descriptions. Might be better in a
table.
4.3 Quant Table
s/are simply stored/are stored/
s/described in 2↑, the/described in 2↑. The/
--
==========================================================================
Rodney Baker VK5ZTV
rodney.baker at iinet.net.au
==========================================================================
More information about the ffmpeg-devel
mailing list