[FFmpeg-devel] [PATCH] speex in ogg muxer

Justin Ruggles justin.ruggles
Sun Sep 6 01:29:57 CEST 2009


Justin Ruggles wrote:

> Justin Ruggles wrote:
> 
>> Justin Ruggles wrote:
>>
>>> Baptiste Coudurier wrote:
>>>> Justin Ruggles wrote:
>>>>> Baptiste Coudurier wrote:
>>>>>> Hi Justin,
>>>>>>
>>>>>> Justin Ruggles wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> This patch adds speex support to the ogg muxer.  It basically does the
>>>>>>> same thing as Ogg/FLAC, in that the 1st packet is a global header from
>>>>>>> extradata and the 2nd packet is vorbiscomment metadata.
>>>>>>>
>>>>>>> This seems to work just fine for speex-to-speex stream copy, but
>>>>>>> probably would not work for flv-to-speex because flv doesn't to have any
>>>>>>> speex extradata from what I can tell.  I guess a header could be
>>>>>>> constructed, but that would be a separate patch to the flv demuxer.
>>>>>>>
>>>>>>> This patch is a precursor to libspeex encoding support, which I'll be
>>>>>>> sending shortly.
>>>>>>>
>>>>>>> -Justin
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ------------------------------------------------------------------------
>>>>>>>
>>>>>>> Index: libavformat/oggenc.c
>>>>>>> ===================================================================
>>>>>>> --- libavformat/oggenc.c	(revision 19244)
>>>>>>> +++ libavformat/oggenc.c	(working copy)
>>>>>>> @@ -104,17 +125,39 @@
>>>>>>>      bytestream_put_byte(&p, 0x00); // streaminfo
>>>>>>>      bytestream_put_be24(&p, 34);
>>>>>>>      bytestream_put_buffer(&p, streaminfo, FLAC_STREAMINFO_SIZE);
>>>>>>> -    oggstream->header_len[1] = 1+3+4+strlen(vendor)+4;
>>>>>>> -    oggstream->header[1] = av_mallocz(oggstream->header_len[1]);
>>>>>>> -    p = oggstream->header[1];
>>>>>>> +    p = ogg_write_vorbiscomment(4, bitexact, &oggstream->header_len[1]);
>>>>>>> +    if (!p)
>>>>>>> +        return -1;
>>>>>> AVERROR(ENOMEM)
>>>>> fixed.
>>>>>
>>>>>>> @@ -144,6 +188,12 @@
>>>>>>>                  av_log(s, AV_LOG_ERROR, "Extradata corrupted\n");
>>>>>>>                  av_freep(&st->priv_data);
>>>>>>>              }
>>>>>>> +        } else if (st->codec->codec_id == CODEC_ID_SPEEX) {
>>>>>>> +            if (ogg_build_speex_headers(st->codec, oggstream,
>>>>>>> +                                        st->codec->flags & CODEC_FLAG_BITEXACT) < 0) {
>>>>>>> +                av_log(s, AV_LOG_ERROR, "error writing Speex headers\n");
>>>>>>> +                av_freep(&st->priv_data);
>>>>>>> +            }
>>>>>> return error here with the return code of the func :>
>>>>>> Yes, it seems flac miss it too, this needs a fix.
>>>>>>
>>>>>> patch fine otherwise, maybe a micro bump for avformat would be nice.
>>>>> fixed. new patch attached. the new patch also differs in that it
>>>>> overrides the extra_headers field in the Speex header to be 0 since only
>>>>> the 2 required headers are written.
>>>>>
>>>> patch ok if it works :>
>> Ok, back to square one.
>>
>>> Hmm... I've done several more tests and it does not quite work as-is for
>>> all samples.  Here is what I have run into.  The tests so far are for
>>> ogg-to-ogg stream copy.
>>>
>>> - When the source has more than 1 frame per packet, the resulting copy
>>> plays fine with ffmpeg/ffplay but is quick and choppy with speexdec.  I
>>> was able to fix this by modifying the ogg/speex demuxer to set
>>> avctx->frame_size to the number of samples in a packet instead of in a
>>> frame.  I also had to update the libspeex decoder accordingly.  Maybe
>>> this is the wrong way to go about it though.  I'm guessing it is a
>>> timestamp/granulepos issue, but I don't know enough about Ogg to tell
>>> more than that.
>> This is now corrected after much discussion. :)
>>
>>> - Even with the fix and even with 1 frame per packet, 2 short samples
>>> I've tested so far have a single soft pop when the stream-copied file is
>>> decoded with speexdec, but it's fine with ffmpeg/ffplay.
>>>
>>> Maybe someone else might have an idea of what could be going wrong?
>> Now I think I know what is going wrong, and there is nothing we can do
>> about it I think.  speexenc does some weird things with granule
>> positions.  It starts out for a long time with granulepos=0 even though
>> it is encoding audio, then when it starts writing granule positions it
>> is not always in sync with the start of the stream.  Below is a little
>> snippet from a comparison of an original spx file to a copied spx file.
>>  Each packet should be 320 samples.
>>
>> [...]
>>
>> -00:00:00.000: serialno 1626088319, calc. gpos 0, packetno 57
>> +00:00:01.120: serialno 0000000000, granulepos 17920, packetno 57
>>
>> -00:00:00.000: serialno 1626088319, calc. gpos 0, packetno 58
>> +00:00:01.140: serialno 0000000000, granulepos 18240, packetno 58
>>
>> -00:00:00.000: serialno 1626088319, calc. gpos 0, packetno 59
>> +00:00:01.160: serialno 0000000000, granulepos 18560, packetno 59
>>
>> -00:00:01.171: serialno 1626088319, granulepos 18737, packetno 60
>> +00:00:01.180: serialno 0000000000, granulepos 18880, packetno 60
>>
>> -00:00:01.191: serialno 1626088319, calc. gpos 19057, packetno 61
>> +00:00:01.191: serialno 0000000000, granulepos 19057, packetno 61
>>
>> -00:00:01.211: serialno 1626088319, calc. gpos 19377, packetno 62
>> +00:00:01.211: serialno 0000000000, granulepos 19377, packetno 62
> 
> So... I figured it out, but you may not want to know the answer. ;)
> 
> The granulepos of the first packet is supposed to be interpreted as
> smaller than the full frame size by calculating what the granulepos of
> the first page would normally be, then subtracting it from what it
> really is to get the delay.
> 
>>From above, this is the last packet in the first page. There are 59
> packets per page in this stream (the first 2 packets are headers, hence
> the packetno of 60).
>> -00:00:01.171: serialno 1626088319, granulepos 18737, packetno 60
>> +00:00:01.180: serialno 0000000000, granulepos 18880, packetno 60
> 
> speexdec interprets the first packet as having a delay of
> 18880-18737=143 samples.  So the first packet should be 320-143=177
> samples long, and the decoder discards the first 143 samples of the
> first frame.
> 
> None of this is documented except for in the speexenc and speexdec
> source code.  From analyzing a Speex-in-FLV sample, it appears that the
> way Adobe handles this in Flash Media Server is to do like our ogg
> demuxer does and interpret the first page as if each frame is 320
> samples, then resync timestamps with the source after the first page,
> causing a skip in timestamps after the first page instead of at the
> beginning of the stream.
> 
> I'm still not sure what to do about this though...

This patch makes it so that all the pts and durations are correct for
Ogg/Speex.  It basically just changes the durations of the first and
last packets.

I don't like that such a hack is needed for proper handling of Speex
transmission delay, but it is required to mirror the hack used by
speexenc and speexdec, which are the only official references for how to
handle this issue properly.

This patch would only fix demuxing and stream copy, not decoding.  The
next step after this would be to change the libspeex decoder to use the
packet durations to chop off first samples from the first frame and last
samples from the last frame.

-Justin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: speex_granulepos_delay.patch
Type: text/x-diff
Size: 3534 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20090905/799a59b4/attachment.patch>



More information about the ffmpeg-devel mailing list