[FFmpeg-devel] [PATCH] Set correct frame_size for Speex decoding

Mon Aug 17 04:01:53 CEST 2009

Michael Niedermayer wrote:

> On Sun, Aug 16, 2009 at 12:26:32PM -0400, Justin Ruggles wrote:
>> Michael Niedermayer wrote:
>>
>>> On Sun, Aug 16, 2009 at 05:07:18PM +0200, Michael Niedermayer wrote:
>>>> On Sat, Aug 15, 2009 at 07:55:46PM -0400, Justin Ruggles wrote:
>>>>> Michael Niedermayer wrote:
>>> [...]
>>>>>>>> what exactly is the argument you have that speex should not be handled like
>>>>>>>> every other codec?!
>>>>>>>> split it in a parser, the muxer muxes ONLY a single speex packet per
>>>>>>>> container packet. Any extension from that is low priority and "patch welcome"
>>>>>>>> IMHO ...
>>>>>>> The downside for Speex is the container overhead since individual frames
>>>>>>> are very small.
>>>>>> this is true for many (most?) speech codecs
>>>>>>
>>>>>> also if we write our own speex encoder, it will only return one frame at a
>>>>>> time.
>>>>> Why would it have to?  
>>>> because the API requires it, both encoders and decoders have to implement the
>>>> API, a video encoder also cannot return 5 frames in one packet.
>>>> APIs can be changed but you arent arguing for a API change you argue for
>>>> ignoring the API and just doing something else.
>>>>
>>>>
>>>>> If the user sets frame_size before initialization
>>>>> to a number of samples that is a multiple of a single frame, it could
>>>>> return multiple speex frames at once, properly joined together and
>>>>> padded at the end.  With libspeex this is very easy to do because the
>>>>> functionality is built into the API.
>>>>>
>>>>> I understand the desire to keep what are called frames as separate
>>>>> entities, but in the case of Speex I see it as more advantagous to allow
>>>>> the user more control when encoding.  If frames are always split up,
>>>>> this limits the users options for both stream copy *and* for encoding to
>>>>>  just 1 frame per packet.
>>>>>
>>>>> If you're dead-set against this idea, then I will finish the parser that
>>>>> splits packets in order to continue with my other Speex patches, but I
>>>>> don't like how limiting it would be for the user.
>>>> i am againt speex handling things different than other speech codecs
>>>> based on arguments that apply to other speech codecs as well.
>>>> Also iam against passing data between muxer and codec layers in a way
>>>> that violates the API.
>>> ffmpeg seperates muxer and codec layers, writing a demuxer & decoder
>>> that depend on things beyond the API (frames per frame) is going to
>>> break things. We had similar great code (passing structs in AVPacket.data
>>> because it was convenient) that also didnt turn out to be that great
>>> and required a complete rewrite ...
>>>
>>>
>>> ive alraedy said nut doesnt allow multiple frames per packet, but its
>>> not just nut, avi as well doesnt allow multiple frames per packet
>>> for VBR and either way avi needs to have its headers set up properly,
>>> not with fake frame size and such and flv as we alaredy know has a 
>>> issue with >8 frames per frame. All that is just what we know of will
>>> break if you implement your hack, what else will break is something we
>>> only would learn after some time.
>>>
>>> IMHO, demuxer->parser->splited frames [unless it is not possible to split]
>>> if a muxer can store multiple frames it can merge several depending on its
>>> abilities and user provided parameters, that merging could also be done
>>> as a bitstream filter.
>>> But just skiping the spliting and merging and hoping that every container
>>> would allow anyting that any other container allowed is just not going to
>>> work. And even more so as we already know of many combinations that would
>>> noz work
>> I do understand your point.  There is a subtle difference with speex
>> though.  The process of merging of frames into groups of frames is
>> something that is specified by the codec itself, not the container.  To
>> the container, it would be as transparent as for an audio codec that
>> allows different numbers of samples/duration for a frame.  Nut would
>> support it just fine, as it does with FLAC with different numbers of
>> samples per frame.  As for FLV, it would be the same as if it doesn't
>> allow over a certain number of samples per frame before getting choppy
>> and/or crashing.
> 
> hmmm, could you quote the part of the speex spec that supports your view?
> and provide a link to that spec. (iam asking for a quote instead of just
> url cuz of the ML archive ...)

unfortunately there is not a speex specification per-se, but a speex
manual that gives some hints and guidance.  the libspeex code is
definitive reference.

http://speex.org/docs/manual/speex-manual/node7.html

"Sometimes it is desirable to pack more than one frame per packet (or
other basic unit of storage). The proper way to do it is to call
speex_encode $ N$  times before writing the stream with speex_bits_write.

In cases where the number of frames is not determined by an out-of-band
mechanism, it is possible to include a terminator code. That terminator
consists of the code 15 (decimal) encoded with 5 bits, as shown in Table
4. Note that as of version 1.0.2, calling speex_bits_write automatically
inserts the terminator so as to fill the last byte. This doesn't
involves any overhead and makes sure Speex can always detect when there
is no more frame in a packet."

The end of the packet is able detected because it looks like the start
of another frame, but the mode id is the terminator code.  This is
needed because a full valid speex frame can be only 5 bits.

What happens when libspeex writes frames is this:
If all the combined frames end on a byte boundary, no terminator code is
used because the packet data will end at the end of a frame.  If the
combined frames do not end on a byte boundary, a 0-bit followed by
enough 1's to pad to the next byte are appended.  The result is that if
there are 5 or more bits, the terminator code will be detected.

> 
>> That said, I think a bitstream filter could work too, but I don't know
>> how options are passed to it, if that is even possible.
> 
> see FIXME:
> static int opt_bsf(const char *opt, const char *arg)
> {
>     AVBitStreamFilterContext *bsfc= av_bitstream_filter_init(arg); //FIXME split name and args for filter at '='
> 
> and
> 
> int a= av_bitstream_filter_filter(bsfc, avctx, NULL,
>                                                ^^^^
> put parameters in there

oh fun. thanks. :)

-Justin