[FFmpeg-devel] [PATCH] FLAC parser

Sun Mar 29 04:51:14 CEST 2009

On Sat, Mar 28, 2009 at 10:24:04PM -0400, Justin Ruggles wrote:
> Michael Niedermayer wrote:
> > On Sat, Mar 28, 2009 at 06:41:02PM -0400, Justin Ruggles wrote:
> >> Michael Niedermayer wrote:
> >>> On Fri, Mar 27, 2009 at 01:05:47AM -0400, Justin Ruggles wrote:
> >>>> Hi,
> >>>>
> >>>> I finally got a working FLAC parser without resorting to buffering
> >>>> max_frame_size bytes like the FLAC decoder does.  It requires a slight
> >>>> change to ff_combine_frame() since the header can be up to 16 bytes long
> >>>> and ff_combine_frame() currently only supports up to 8 bytes of overread
> >>>> data (FF_INPUT_BUFFER_PADDING_SIZE).
> >>>>
> >>>> This works with all samples I've tested, but it would be great to have
> >>>> more tested as well.  There are quite a few corner cases, and while I've
> >>>> tried to think of everything I can, I might have missed something.
> >>> If i understand this correctly,
> >>> this is a probabilistic parser, that is it will fail once in 4tb of
> >>> random data at least, but due to the max crc stuff sooner.
> >>> and as data is not random it could fail more frequently
> >> Yes. The only alternatives I can see would be decoding twice (although
> >> inverse prediction, and interleaving could be skipped in the parser) or
> >> not having a parser, which prevents stream copy to most other containers.
> > 
> > You can do better without decoding twice
> > first find all valid headers (sync + header crc + a few bits in the header)
> 
> When you say "all valid headers" do you mean to buffer the whole file?

no

> Would it be ok to check sequences in a fixed number of headers or a
> fixed number of bytes (this would have to be like pretty large to be useful)

it would be possible for example to add to the buffer until enough headers
are detected. or until a long enough connected chain is found ...
(probably the one fitting better in the code ....)

> 
> > that would give you something like:
> > H          H      H                      H       H                    H
> > 
> > then calculate the frame crc between each 2 headers
> > H          H      H                      H       H                    H
> > --CRC!=0-->
> > ---------CRC==0-->
> >           -CRC!=0->
> >           ----------------------CRC==0-->
> >                   --------------CRC!=0-->
> >                   ----------------------CRC!=0-->
> >                                          -CRC==0->
> >                                          ------------CRC==0---------->
> >                                                   ------CRC==0------->
> > 
> > Then find the longest connected chain of valid CRCs, if there is another
> > non overlapping chain of valid crcs before it pick it instead.
> > Return the first frame from the chain.
> 
> I just want to make sure I understand you correctly.
> 
> H0-->H2 = 1
> H1-->H3-->H4-->H5 = 3
> H2 = 0
> H3-->H4-->H5 = 2
> H4-->H5 = 1
> 
> This would select the chain starting with H1, and the data between the
> start of the buffer and H1 should be returned in this case.

yes

> 
> Then the CRCs starting at H0 can be tossed out, and after the next
> header is found, calculate 2 new CRCs, H4-->H6 and H5-->H6 and
> re-evaluate the sequences.

yes, also i just used n->n+1 / n->n+2 for sake of readability, one
should prossibly consider more

> 
> > also you should favor chains which dont change samplerate or other such
> > things.
> 
> Do you mean to pick a sequence that doesn't change values over an
> earlier sequence of the same size that does change values?  Or something
> with more weight than that?

I think this is not that important ATM, but i would tend to give
things like samplerate changes more weight.
i mean, for example:
h0----->h1------->h2
  --------------->

and assuming all crcs match
but h1 has a different sample rate than h0 and h2 
here the shorter chain seems more likely correct, but this depends on
my assumtation that a single lonely packet with different samplerate is
very unlikely.

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

If you really think that XML is the answer, then you definitly missunderstood
the question -- Attila Kinali
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20090329/c191ab70/attachment.pgp>