[FFmpeg-devel] [GSOC] FLIF16 Decoder (and Encoder)

Mon May 4 21:26:44 EEST 2020

Hello,

I have a question regerding the internal decoding/encoding API. There seems
to be two functions that may be alternatively called, on the basis of whether
they are defined or not, both defined in the AVCodec struct:
1. decode()
2. receive_frame()

From the comments above the definition of receive_frame():
"Decode API with decoupled packet/frame dataflow. This function is called to get
one output frame. It should call ff_decode_get_packet() to obtain input data."

From what I can see, if receive_frame() is not defined, a separate function is
called, decode_simple_receive_frame(), which in turn calls 
decode_receive_frame_internal(). 
*   receive_frame() therefore allows more "fine grained" control on how the 
    frame generation and return values are handled, from what I can see.
*   But for most purposes defining decode() is enough.
*   receive_frame() takes priority over decode().

Is this correct?

---

I have gone through theory regarding arithmetic coding and have been able to
build my own arithmetic encoder. Range coding seems to differ 
from an arithmetic coder in only two cases, which is the renormalisation and 
ending the encoding/decoding process. Here's what I have managed to gather:

* The general process of renormalisation in an arithcoder is [2]:
(Here, `n` is initially 0)
While range is too small: (range < Quarter of original)
|    If range is in first half of original range ("Upper" condition)
|        Output a 0, and `n` number of 1's
|    Else If range is in second half of original range: ("Lower" condidtion)
|        Output a 1, and `n` number of 0's
|    Else If range straddles the middle: ("Straddle" condition)
|        Increment `n` by 1
|    
|    Blow up the range by multiplying `range` and `low` by 2 
|    (or shifting left by 1)

The range-low notation (instead of the supposedly conventional high-low 
notation) is used in most of the rangecoder algorithms/implementations I have
seen.

* Whereas, in the range coder implementation of [1], the following happens:
(here, `buffer` is 1 byte in size, and `low`, `range` are 4 bytes in size)
(All arithmetic is in unsigned integers)
Upper bound of the range is 1 << 31, and lower bound is (upper bound >> 8)
(0x80000000 and 0x00800000 respectively)

While range is too small: (range < lower bound)
|   If  `low` is less than 0x7F800000 (0xFF << 23)
|       Output `buffer`, and `n` number of 0xFF's
|       Put MSByte of `low` into `buffer`
|   Else if MSBit of `low` is 1 (The "Carry" bit)
|       Output (`buffer` + 1) and `n` number of 0x0's
|       Put MSByte of `low` into `buffer`
|   Else
|       Increment `n` by 1
|   
|   Left Shift range by 8
|   Left Shift low by 8
|   Clear the carry bit of `low`

So we can make out the limits to be as:
"Lower" half: low < 0x7F800000
"Upper" half: low >= 0x80000000 (Counting in the previous condition)
"Straddle" condition: `low` is greater than or eq. to 0x7F800000, and the
number's carry bit is 0, i.e., 0x7F800000 <= `low` < 0x80000000. There isn't a
definitive "point" for the straddling but a range, from what I can see.

Note that 0x7F800000 + 0x00800000 = 0x80000000.

The renormalisation used in the RAC FLIF uses [3] is much more simpler than the 
one described above. So more or less easily implementable. However, the upper
and lower limits used in ffmpeg's rangecoder implementation cannot be changed 
and would need some rewriting. Consequently, we will have to rewrite/change 
the code in any codec that uses the functions.

I will also add in an explanation regarding the range coder in the FLIF
specification, which is not present, using whatever I have understood.

The rangecoder implementation used in ffmpeg uses a very small range (0xFF00).
I do not know the exact rationale behind it since I think it will require more
frequent renormalisation, but it may have someting to do with the codecs it was
implemented for, or the probability model.

For now, I will implement the RAC in a secondary file without altering 
rangecoder.c/rangecoder.h.

However, in the FLIF sourcecode, src/maniac/chance.cpp [4] uses an almost
exact copy of the function in rangecoder.c for building the probability model
(libavcodec/rangcoder.c, ff_build_rac_states). Therefore we can reuse that.
It might also be possible that the original author copy pasted it there with
some modifications.

Dealing with the second header shouldn't be much of a problem now.

An interesting thing is that the uniform symbol coder (see 
src/maniac/symbol.hpp) [5] is implemented recursively, which might mean that the
recursion was written in mind that a stack overflow will not occur.

---

As for the pixel transformations, it will mostly consist of translating the
existing reference implementation into the ffmpeg API/C. About the same goes for
the MANIAC encoding.

---

In conclusion, I believe writing the decoder will involve a lot of "copying and
pasting" (or closely following) the FLIF reference implementation, which I really
don't want to do. I want to do it with the full knowledge of how the codec
works. Although reading through the reference implementation will also
inevitably include review of the sourcecode and hence chanves.

Regards,
Anamitra Ghorui

---

[1]: http://www.compressconsult.com/rangecoder/ , https://archive.fo/5F1y
[2]: https://web.stanford.edu/class/ee398a/handouts/papers/WittenACM87ArithmCoding.pdf
[3]: https://github.com/FLIF-hub/FLIF/blob/master/src/maniac/rac.hpp
[4]: https://github.com/FLIF-hub/FLIF/blob/master/src/maniac/chance.cpp (function build_table)
[5]: https://github.com/FLIF-hub/FLIF/blob/master/src/maniac/symbol.hpp

(not listed in the order of occurence)