[FFmpeg-devel] [PATCH] Set correct frame_size for Speex decoding

Michael Niedermayer michaelni
Thu Aug 27 03:56:55 CEST 2009


On Wed, Aug 26, 2009 at 08:20:05PM -0400, Justin Ruggles wrote:
> Michael Niedermayer wrote:
> 
> > On Sun, Aug 23, 2009 at 07:16:21PM -0400, Justin Ruggles wrote:
> >> Justin Ruggles wrote:
> >>
> >>> Michael Niedermayer wrote:
> >>>
> >>>> On Sun, Aug 16, 2009 at 10:01:53PM -0400, Justin Ruggles wrote:
> >>>>> Michael Niedermayer wrote:
> >>>>>
> >>>>>> On Sun, Aug 16, 2009 at 12:26:32PM -0400, Justin Ruggles wrote:
> >>>>>>> Michael Niedermayer wrote:
> >>>>>>>
> >>>>>>>> On Sun, Aug 16, 2009 at 05:07:18PM +0200, Michael Niedermayer wrote:
> >>>>>>>>> On Sat, Aug 15, 2009 at 07:55:46PM -0400, Justin Ruggles wrote:
> >>>>>>>>>> Michael Niedermayer wrote:
> >>>>>>>> [...]
> >>>>>>>>>>>>> what exactly is the argument you have that speex should not be handled like
> >>>>>>>>>>>>> every other codec?!
> >>>>>>>>>>>>> split it in a parser, the muxer muxes ONLY a single speex packet per
> >>>>>>>>>>>>> container packet. Any extension from that is low priority and "patch welcome"
> >>>>>>>>>>>>> IMHO ...
> >>>>>>>>>>>> The downside for Speex is the container overhead since individual frames
> >>>>>>>>>>>> are very small.
> >>>>>>>>>>> this is true for many (most?) speech codecs
> >>>>>>>>>>>
> >>>>>>>>>>> also if we write our own speex encoder, it will only return one frame at a
> >>>>>>>>>>> time.
> >>>>>>>>>> Why would it have to?  
> >>>>>>>>> because the API requires it, both encoders and decoders have to implement the
> >>>>>>>>> API, a video encoder also cannot return 5 frames in one packet.
> >>>>>>>>> APIs can be changed but you arent arguing for a API change you argue for
> >>>>>>>>> ignoring the API and just doing something else.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> If the user sets frame_size before initialization
> >>>>>>>>>> to a number of samples that is a multiple of a single frame, it could
> >>>>>>>>>> return multiple speex frames at once, properly joined together and
> >>>>>>>>>> padded at the end.  With libspeex this is very easy to do because the
> >>>>>>>>>> functionality is built into the API.
> >>>>>>>>>>
> >>>>>>>>>> I understand the desire to keep what are called frames as separate
> >>>>>>>>>> entities, but in the case of Speex I see it as more advantagous to allow
> >>>>>>>>>> the user more control when encoding.  If frames are always split up,
> >>>>>>>>>> this limits the users options for both stream copy *and* for encoding to
> >>>>>>>>>>  just 1 frame per packet.
> >>>>>>>>>>
> >>>>>>>>>> If you're dead-set against this idea, then I will finish the parser that
> >>>>>>>>>> splits packets in order to continue with my other Speex patches, but I
> >>>>>>>>>> don't like how limiting it would be for the user.
> >>>>>>>>> i am againt speex handling things different than other speech codecs
> >>>>>>>>> based on arguments that apply to other speech codecs as well.
> >>>>>>>>> Also iam against passing data between muxer and codec layers in a way
> >>>>>>>>> that violates the API.
> >>>>>>>> ffmpeg seperates muxer and codec layers, writing a demuxer & decoder
> >>>>>>>> that depend on things beyond the API (frames per frame) is going to
> >>>>>>>> break things. We had similar great code (passing structs in AVPacket.data
> >>>>>>>> because it was convenient) that also didnt turn out to be that great
> >>>>>>>> and required a complete rewrite ...
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> ive alraedy said nut doesnt allow multiple frames per packet, but its
> >>>>>>>> not just nut, avi as well doesnt allow multiple frames per packet
> >>>>>>>> for VBR and either way avi needs to have its headers set up properly,
> >>>>>>>> not with fake frame size and such and flv as we alaredy know has a 
> >>>>>>>> issue with >8 frames per frame. All that is just what we know of will
> >>>>>>>> break if you implement your hack, what else will break is something we
> >>>>>>>> only would learn after some time.
> >>>>>>>>
> >>>>>>>> IMHO, demuxer->parser->splited frames [unless it is not possible to split]
> >>>>>>>> if a muxer can store multiple frames it can merge several depending on its
> >>>>>>>> abilities and user provided parameters, that merging could also be done
> >>>>>>>> as a bitstream filter.
> >>>>>>>> But just skiping the spliting and merging and hoping that every container
> >>>>>>>> would allow anyting that any other container allowed is just not going to
> >>>>>>>> work. And even more so as we already know of many combinations that would
> >>>>>>>> noz work
> >>>>>>> I do understand your point.  There is a subtle difference with speex
> >>>>>>> though.  The process of merging of frames into groups of frames is
> >>>>>>> something that is specified by the codec itself, not the container.  To
> >>>>>>> the container, it would be as transparent as for an audio codec that
> >>>>>>> allows different numbers of samples/duration for a frame.  Nut would
> >>>>>>> support it just fine, as it does with FLAC with different numbers of
> >>>>>>> samples per frame.  As for FLV, it would be the same as if it doesn't
> >>>>>>> allow over a certain number of samples per frame before getting choppy
> >>>>>>> and/or crashing.
> >>>>>> hmmm, could you quote the part of the speex spec that supports your view?
> >>>>>> and provide a link to that spec. (iam asking for a quote instead of just
> >>>>>> url cuz of the ML archive ...)
> >>>>> unfortunately there is not a speex specification per-se, but a speex
> >>>>> manual that gives some hints and guidance.  the libspeex code is
> >>>>> definitive reference.
> >>>>>
> >>>>> http://speex.org/docs/manual/speex-manual/node7.html
> >>>>>
> >>>>> "Sometimes it is desirable to pack more than one frame per packet (or
> >>>>> other basic unit of storage). The proper way to do it is to call
> >>>>> speex_encode $ N$  times before writing the stream with speex_bits_write.
> >>>>>
> >>>>> In cases where the number of frames is not determined by an out-of-band
> >>>>> mechanism, it is possible to include a terminator code. That terminator
> >>>>> consists of the code 15 (decimal) encoded with 5 bits, as shown in Table
> >>>>> 4. Note that as of version 1.0.2, calling speex_bits_write automatically
> >>>>> inserts the terminator so as to fill the last byte. This doesn't
> >>>>> involves any overhead and makes sure Speex can always detect when there
> >>>>> is no more frame in a packet."
> >>>>>
> >>>>> The end of the packet is able detected because it looks like the start
> >>>>> of another frame, but the mode id is the terminator code.  This is
> >>>>> needed because a full valid speex frame can be only 5 bits.
> >>>>>
> >>>>> What happens when libspeex writes frames is this:
> >>>>> If all the combined frames end on a byte boundary, no terminator code is
> >>>>> used because the packet data will end at the end of a frame.  If the
> >>>>> combined frames do not end on a byte boundary, a 0-bit followed by
> >>>>> enough 1's to pad to the next byte are appended.  The result is that if
> >>>>> there are 5 or more bits, the terminator code will be detected.
> >>>> iam not truely happy about all that but i agree that its overall alot
> >>>> more convenient and simpler to keep these bit packed pieces of bull shit
> >>>> together but in that case it has to be considered that this is a whole
> >>>> frame and all variables have to be set accordingly.
> >>> It is definitely a lot simpler.  I agree it's ugly, but it all works out
> >>> ok if we're consistent in the encoder, decoder, and all muxers and
> >>> demuxers supporting speex.  I'll send a new patch sometime soon with
> >>> more documentation about how speex frames are to be handled throughout
> >>> FFmpeg.
> >> New patch attached.
> > 
> > any volunteers who want to maintain libspeexdec.c ?
> > oggparsespeex.c is alraedy maintained by david
> > (no i dont want to maintain codecs that are desgned like that)
> 
> I volunteer to maintain libspeexdec.c.  David, does the ogg part look ok

then please add yoursef to MAINTAINERS, unless you alraedy did


[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

It is not what we do, but why we do it that matters.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20090827/1445954d/attachment.pgp>



More information about the ffmpeg-devel mailing list