[FFmpeg-devel] GSoC: Regarding Parsing and FLIF16 Frame Encoding

Sat Feb 29 06:50:23 EET 2020

Hello,
I have been reading through the parsing API and other things and here's what 
I've managed to gather (I will be ignoring overruns in these functions for now).
Please tell me if I am right or wrong:

1. As long as the parse function determines next == END_NOT_FOUND, 
   ff_combine_frame will keep increasing the AVParseContext index by buf_size.
   Once next is no longer END_NOT_FOUND, buf_size will be set to index + next.

   The bytes from the input chunks are copied into the buffer of AVParseContext
   during this process.

   while next == END_NOT_FOUND, and the thing being decoded is a video, we 
   cannot really determine the end of frame, and hence poutbuf and poutbuf_size
   are set to zero by the function. However, this doesn't really matter for
   still images since they have a single frame.

2. av_parser_parse2 will look for whether poutbuf_size is greater than zero.
   If it is, the next frame start offset will be advanced, and the frame offset
   pointer will be set to the previous value of the next frame offset in
   AVCodecParserContext.

3. In https://ffmpeg.org/doxygen/trunk/decode_video_8c-example.html
   pkt->size will be set to zero as long as a frame has not been returned.
   Hence decode will not be triggered as long as a frame has not been found.

Now, Regarding FLIF16:
1. The pixels of the image are stored in this format (non interlaced):
(see https://flif.info/spec.html#_part_4_pixel_data)
      _______________________________________________
     |     _________________________________________ |
     |    |     ___________________________________ ||
all  |    |    |     _____________________________ |||
     |    |    |    |                             ||||
     |    |    | f1 | x1 x2 x3 ..... xw           ||||
     |    |    |    |                             ||||
     |    | y1 |    |_____________________________||||
     | c1 |    |                ...                |||
     |    |    |     _____________________________ |||
     |    |    |    |                             ||||
     |    |    | fn | x1 x2 x3 ..... xw           ||||
     |    |    |    |                             ||||
     |    |    |    |_____________________________||||
     |    |    |                                   |||
     |    |    |___________________________________|||
     |    |                 ...                     ||
     |    |     ___________________________________ ||
     |    |    |     _____________________________ |||
     |    |    |    |                             ||||
     |    |    | f1 | x1 x2 x3 ..... xw           ||||
     |    |    |    |                             ||||
     |    | yh |    |_____________________________||||
     |    |    |               ...                 |||
     |    |    |     _____________________________ |||
     |    |    |    |                             ||||
     |    |    | fn | x1 x2 x3 ..... xw           ||||
     |    |    |    |                             ||||
     |    |    |    |_____________________________||||
     |    |    |                                   |||
     |    |    |___________________________________|||
     |    |_________________________________________||
     |                                               |
     |                      ...                      |
     | cn                                            |
     |_______________________________________________|

where: ci: color channel
       yi: pixel row
       fi: frame number
       xi: individual pixel

The frames are not stored in a contiguous manner as observable. How should I be 
getting the frame over here? It dosen't seem possible without either putting the
whole pixel data chunk in memory, or allocating space for all the frames at once
and then putting data in them.

I guess what the parser has to do in that case is that it will have to either
return the whole file length as the buffer to the decoder function, or make the
parser manage frames by itself through its own data structures and component
functions.

What should I be doing here?

2. The FLIF format spec refers to a thing known as the 24 bit RAC. Is it an
   abbreviation for 24 bit RAnge Coding? (https://en.wikipedia.org/wiki/Range_encoding)
   What does the "24 bit" mean? Is it the size of each symbol that is processed
   by the range coder?

I started going through the reference implementation of FLIF. I'll see what I
can make out of it. The decoder by itself under the Apache lisence so we could
refer to it or borrow some things from it: https://github.com/FLIF-hub/FLIF.

Thanks