[FFmpeg-devel] [PATCH] speex in ogg muxer

Sun Sep 6 01:58:22 CEST 2009

Justin Ruggles wrote:

> Justin Ruggles wrote:
> 
>> Justin Ruggles wrote:
>>
>>> Justin Ruggles wrote:
>>>
>>>> Baptiste Coudurier wrote:
>>>>> Justin Ruggles wrote:
>>>>>> Baptiste Coudurier wrote:
>>>>>>> Hi Justin,
>>>>>>>
>>>>>>> Justin Ruggles wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> This patch adds speex support to the ogg muxer.  It basically does the
>>>>>>>> same thing as Ogg/FLAC, in that the 1st packet is a global header from
>>>>>>>> extradata and the 2nd packet is vorbiscomment metadata.
>>>>>>>>
>>>>>>>> This seems to work just fine for speex-to-speex stream copy, but
>>>>>>>> probably would not work for flv-to-speex because flv doesn't to have any
>>>>>>>> speex extradata from what I can tell.  I guess a header could be
>>>>>>>> constructed, but that would be a separate patch to the flv demuxer.
>>>>>>>>
>>>>>>>> This patch is a precursor to libspeex encoding support, which I'll be
>>>>>>>> sending shortly.
>>>>>>>>
>>>>>>>> -Justin
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> ------------------------------------------------------------------------
>>>>>>>>
>>>>>>>> Index: libavformat/oggenc.c
>>>>>>>> ===================================================================
>>>>>>>> --- libavformat/oggenc.c	(revision 19244)
>>>>>>>> +++ libavformat/oggenc.c	(working copy)
>>>>>>>> @@ -104,17 +125,39 @@
>>>>>>>>      bytestream_put_byte(&p, 0x00); // streaminfo
>>>>>>>>      bytestream_put_be24(&p, 34);
>>>>>>>>      bytestream_put_buffer(&p, streaminfo, FLAC_STREAMINFO_SIZE);
>>>>>>>> -    oggstream->header_len[1] = 1+3+4+strlen(vendor)+4;
>>>>>>>> -    oggstream->header[1] = av_mallocz(oggstream->header_len[1]);
>>>>>>>> -    p = oggstream->header[1];
>>>>>>>> +    p = ogg_write_vorbiscomment(4, bitexact, &oggstream->header_len[1]);
>>>>>>>> +    if (!p)
>>>>>>>> +        return -1;
>>>>>>> AVERROR(ENOMEM)
>>>>>> fixed.
>>>>>>
>>>>>>>> @@ -144,6 +188,12 @@
>>>>>>>>                  av_log(s, AV_LOG_ERROR, "Extradata corrupted\n");
>>>>>>>>                  av_freep(&st->priv_data);
>>>>>>>>              }
>>>>>>>> +        } else if (st->codec->codec_id == CODEC_ID_SPEEX) {
>>>>>>>> +            if (ogg_build_speex_headers(st->codec, oggstream,
>>>>>>>> +                                        st->codec->flags & CODEC_FLAG_BITEXACT) < 0) {
>>>>>>>> +                av_log(s, AV_LOG_ERROR, "error writing Speex headers\n");
>>>>>>>> +                av_freep(&st->priv_data);
>>>>>>>> +            }
>>>>>>> return error here with the return code of the func :>
>>>>>>> Yes, it seems flac miss it too, this needs a fix.
>>>>>>>
>>>>>>> patch fine otherwise, maybe a micro bump for avformat would be nice.
>>>>>> fixed. new patch attached. the new patch also differs in that it
>>>>>> overrides the extra_headers field in the Speex header to be 0 since only
>>>>>> the 2 required headers are written.
>>>>>>
>>>>> patch ok if it works :>
>>> Ok, back to square one.
>>>
>>>> Hmm... I've done several more tests and it does not quite work as-is for
>>>> all samples.  Here is what I have run into.  The tests so far are for
>>>> ogg-to-ogg stream copy.
>>>>
>>>> - When the source has more than 1 frame per packet, the resulting copy
>>>> plays fine with ffmpeg/ffplay but is quick and choppy with speexdec.  I
>>>> was able to fix this by modifying the ogg/speex demuxer to set
>>>> avctx->frame_size to the number of samples in a packet instead of in a
>>>> frame.  I also had to update the libspeex decoder accordingly.  Maybe
>>>> this is the wrong way to go about it though.  I'm guessing it is a
>>>> timestamp/granulepos issue, but I don't know enough about Ogg to tell
>>>> more than that.
>>> This is now corrected after much discussion. :)
>>>
>>>> - Even with the fix and even with 1 frame per packet, 2 short samples
>>>> I've tested so far have a single soft pop when the stream-copied file is
>>>> decoded with speexdec, but it's fine with ffmpeg/ffplay.
>>>>
>>>> Maybe someone else might have an idea of what could be going wrong?
>>> Now I think I know what is going wrong, and there is nothing we can do
>>> about it I think.  speexenc does some weird things with granule
>>> positions.  It starts out for a long time with granulepos=0 even though
>>> it is encoding audio, then when it starts writing granule positions it
>>> is not always in sync with the start of the stream.  Below is a little
>>> snippet from a comparison of an original spx file to a copied spx file.
>>>  Each packet should be 320 samples.
>>>
>>> [...]
>>>
>>> -00:00:00.000: serialno 1626088319, calc. gpos 0, packetno 57
>>> +00:00:01.120: serialno 0000000000, granulepos 17920, packetno 57
>>>
>>> -00:00:00.000: serialno 1626088319, calc. gpos 0, packetno 58
>>> +00:00:01.140: serialno 0000000000, granulepos 18240, packetno 58
>>>
>>> -00:00:00.000: serialno 1626088319, calc. gpos 0, packetno 59
>>> +00:00:01.160: serialno 0000000000, granulepos 18560, packetno 59
>>>
>>> -00:00:01.171: serialno 1626088319, granulepos 18737, packetno 60
>>> +00:00:01.180: serialno 0000000000, granulepos 18880, packetno 60
>>>
>>> -00:00:01.191: serialno 1626088319, calc. gpos 19057, packetno 61
>>> +00:00:01.191: serialno 0000000000, granulepos 19057, packetno 61
>>>
>>> -00:00:01.211: serialno 1626088319, calc. gpos 19377, packetno 62
>>> +00:00:01.211: serialno 0000000000, granulepos 19377, packetno 62
>> So... I figured it out, but you may not want to know the answer. ;)
>>
>> The granulepos of the first packet is supposed to be interpreted as
>> smaller than the full frame size by calculating what the granulepos of
>> the first page would normally be, then subtracting it from what it
>> really is to get the delay.
>>
>>> >From above, this is the last packet in the first page. There are 59
>> packets per page in this stream (the first 2 packets are headers, hence
>> the packetno of 60).
>>> -00:00:01.171: serialno 1626088319, granulepos 18737, packetno 60
>>> +00:00:01.180: serialno 0000000000, granulepos 18880, packetno 60
>> speexdec interprets the first packet as having a delay of
>> 18880-18737=143 samples.  So the first packet should be 320-143=177
>> samples long, and the decoder discards the first 143 samples of the
>> first frame.
>>
>> None of this is documented except for in the speexenc and speexdec
>> source code.  From analyzing a Speex-in-FLV sample, it appears that the
>> way Adobe handles this in Flash Media Server is to do like our ogg
>> demuxer does and interpret the first page as if each frame is 320
>> samples, then resync timestamps with the source after the first page,
>> causing a skip in timestamps after the first page instead of at the
>> beginning of the stream.
>>
>> I'm still not sure what to do about this though...
> 
> This patch makes it so that all the pts and durations are correct for
> Ogg/Speex.  It basically just changes the durations of the first and
> last packets.

nevermind. this doesn't quite work. i'm still working on it. damn ogg
and its craziness!

-Justin