[FFmpeg-devel] [PATCH] speex in ogg muxer
Justin Ruggles
justin.ruggles
Sat Oct 10 15:36:52 CEST 2009
Justin Ruggles wrote:
> David Conrad wrote:
>
>> On Sep 5, 2009, at 8:20 PM, Justin Ruggles wrote:
>>
>>> Justin Ruggles wrote:
>>>
>>>> Justin Ruggles wrote:
>>>>
>>>>> Justin Ruggles wrote:
>>>>>
>>>>>> Justin Ruggles wrote:
>>>>>>
>>>>>>> Justin Ruggles wrote:
>>>>>>>
>>>>>>> Now I think I know what is going wrong, and there is nothing we
>>>>>>> can do
>>>>>>> about it I think. speexenc does some weird things with granule
>>>>>>> positions. It starts out for a long time with granulepos=0 even
>>>>>>> though
>>>>>>> it is encoding audio, then when it starts writing granule
>>>>>>> positions it
>>>>>>> is not always in sync with the start of the stream. Below is a
>>>>>>> little
>>>>>>> snippet from a comparison of an original spx file to a copied
>>>>>>> spx file.
>>>>>>> Each packet should be 320 samples.
>>>>>>>
>>>>>>> [...]
>>>>>> So... I figured it out, but you may not want to know the answer. ;)
>>>>>>
>>>>>> The granulepos of the first packet is supposed to be interpreted as
>>>>>> smaller than the full frame size by calculating what the
>>>>>> granulepos of
>>>>>> the first page would normally be, then subtracting it from what it
>>>>>> really is to get the delay.
>>>>>>
>>>>>>>> From above, this is the last packet in the first page. There
>>>>>>>> are 59
>>>>>> packets per page in this stream (the first 2 packets are headers,
>>>>>> hence
>>>>>> the packetno of 60).
>>>>>>> -00:00:01.171: serialno 1626088319, granulepos 18737, packetno 60
>>>>>>> +00:00:01.180: serialno 0000000000, granulepos 18880, packetno 60
>>>>>> speexdec interprets the first packet as having a delay of
>>>>>> 18880-18737=143 samples. So the first packet should be 320-143=177
>>>>>> samples long, and the decoder discards the first 143 samples of the
>>>>>> first frame.
>>>>>>
>>>>>> None of this is documented except for in the speexenc and speexdec
>>>>>> source code. From analyzing a Speex-in-FLV sample, it appears
>>>>>> that the
>>>>>> way Adobe handles this in Flash Media Server is to do like our ogg
>>>>>> demuxer does and interpret the first page as if each frame is 320
>>>>>> samples, then resync timestamps with the source after the first
>>>>>> page,
>>>>>> causing a skip in timestamps after the first page instead of at the
>>>>>> beginning of the stream.
>>>>>>
>>>>>> I'm still not sure what to do about this though...
>>>>> This patch makes it so that all the pts and durations are correct
>>>>> for
>>>>> Ogg/Speex. It basically just changes the durations of the first and
>>>>> last packets.
>>>> nevermind. this doesn't quite work. i'm still working on it. damn ogg
>>>> and its craziness!
>>> Ok, now this patch should work correctly.
>> After some discussion with xiph people, apparently vorbis does this
>> exact same thing. The reasoning behind it is that libvorbis/libspeex
>> generate additional samples to prime the lapped transform. There is
>> apparently nothing in the vorbis/speex bitstream to indicate how many
>> samples this is, so instead ogg granulepos is used to figure out how
>> many samples to skip at the beginning.
>
> Ouch. Is there a way to pre-parse the first page packets before decoding
> to determine the correct packet durations?
>
>> This is probably why there's such a high PSNR difference between our
>> decoder and libvorbis despite the output sounding fine. However, I'm
>> not sure how to fix this: since vorbis has no fixed frame_size it
>> requires decoding every packet in the first ogg page, then subtracting
>> the page's granulepos from how many samples were decoded. Which would
>> break other containers.
>
> Then does ogg/vorbis have the same timestamp problem as ogg/speex when
> doing stream copy?
>
>>> diff --git a/libavformat/oggparsespeex.c b/libavformat/oggparsespeex.c
>>> index cc00dd2..c295970 100644
>>> --- a/libavformat/oggparsespeex.c
>>> +++ b/libavformat/oggparsespeex.c
>>> @@ -53,6 +64,7 @@ static int speex_header(AVFormatContext *s, int
>>> idx) {
>>> byte-aligned. */
>>> st->codec->frame_size = AV_RL32(p + 56);
>>> frames_per_packet = AV_RL32(p + 64);
>>> + spxp->frame_size = st->codec->frame_size;
>>> if (frames_per_packet)
>>> st->codec->frame_size *= frames_per_packet;
>> [...]
>>
>>> +static int speex_packet(AVFormatContext *s, int idx)
>>> +{
>>> + struct ogg *ogg = s->priv_data;
>>> + struct ogg_stream *os = ogg->streams + idx;
>>> + struct speex_params *spxp = os->private;
>>> + int frames_per_packet = s->streams[idx]->codec->frame_size /
>>> spxp->frame_size;
>> This frame_size / frame_size is confusing; one should be renamed. spxp-
>> >single_frame_size maybe with a comment reminding that the codec-
>> >frame_size is per packet with multiple frames?
>
> I've simplified it now.
>
>>> +
>>> + if (os->flags & OGG_FLAG_EOS && os->lastgp != -1 && os->granule
>>>> 0) {
>>> + /* first packet of last page. we have to calculate the last
>>> packet
>>> + duration here because it is the only place we know the
>>> next-to-last
>>> + granule position. */
>>> + spxp->last_packet_duration = os->granule - os->lastgp +
>>> + spxp->frame_size *
>>> frames_per_packet *
>>> + (1 - ogg_page_packets(os));
>>> + }
>>> +
>>> + if (!os->lastgp && os->granule > 0)
>> This will set this duration for all speex packets in the first page;
>> shouldn't it only be the first (os->seq == 2)?
>
>>From what I can tell, the first packet in a page will have os->lastgp
> equal to the last granulepos (0 in the case of the first page), and any
> subsequent packets in the page will have os->lastgp == -1.
>
>>> + /* first packet */
>>> + os->pduration = os->granule + spxp->frame_size *
>>> frames_per_packet *
>>> + (1 - ogg_page_packets(os));
>> granule - frame_size * frames_per_packet * (ogg_page_packets(os) - 1)
>> and likewise for last_packet_duration is more clear IMO, since it's
>> equivalent to (stream duration including packet) - (stream duration
>> excluding packet).
>
> ok. fixed.
>
> new patch attached.
I updated the patch again. Since the private context is not used in
speex_header(), I moved the allocation to speex_packet().
-Justin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: speex_granulepos_delay_4.patch
Type: text/x-diff
Size: 3077 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20091010/d65130ff/attachment.patch>
More information about the ffmpeg-devel
mailing list