[FFmpeg-devel] [PATCH] speex in ogg muxer

Justin Ruggles justin.ruggles
Sat Oct 10 15:46:43 CEST 2009


Justin Ruggles wrote:

> Justin Ruggles wrote:
> 
>> David Conrad wrote:
>>
>>> On Sep 5, 2009, at 8:20 PM, Justin Ruggles wrote:
>>>
>>>> Justin Ruggles wrote:
>>>>
>>>>> Justin Ruggles wrote:
>>>>>
>>>>>> Justin Ruggles wrote:
>>>>>>
>>>>>>> Justin Ruggles wrote:
>>>>>>>
>>>>>>>> Justin Ruggles wrote:
>>>>>>>>
>>>>>>>> Now I think I know what is going wrong, and there is nothing we  
>>>>>>>> can do
>>>>>>>> about it I think.  speexenc does some weird things with granule
>>>>>>>> positions.  It starts out for a long time with granulepos=0 even  
>>>>>>>> though
>>>>>>>> it is encoding audio, then when it starts writing granule  
>>>>>>>> positions it
>>>>>>>> is not always in sync with the start of the stream.  Below is a  
>>>>>>>> little
>>>>>>>> snippet from a comparison of an original spx file to a copied  
>>>>>>>> spx file.
>>>>>>>> Each packet should be 320 samples.
>>>>>>>>
>>>>>>>> [...]
>>>>>>> So... I figured it out, but you may not want to know the answer. ;)
>>>>>>>
>>>>>>> The granulepos of the first packet is supposed to be interpreted as
>>>>>>> smaller than the full frame size by calculating what the  
>>>>>>> granulepos of
>>>>>>> the first page would normally be, then subtracting it from what it
>>>>>>> really is to get the delay.
>>>>>>>
>>>>>>>>> From above, this is the last packet in the first page. There  
>>>>>>>>> are 59
>>>>>>> packets per page in this stream (the first 2 packets are headers,  
>>>>>>> hence
>>>>>>> the packetno of 60).
>>>>>>>> -00:00:01.171: serialno 1626088319, granulepos 18737, packetno 60
>>>>>>>> +00:00:01.180: serialno 0000000000, granulepos 18880, packetno 60
>>>>>>> speexdec interprets the first packet as having a delay of
>>>>>>> 18880-18737=143 samples.  So the first packet should be 320-143=177
>>>>>>> samples long, and the decoder discards the first 143 samples of the
>>>>>>> first frame.
>>>>>>>
>>>>>>> None of this is documented except for in the speexenc and speexdec
>>>>>>> source code.  From analyzing a Speex-in-FLV sample, it appears  
>>>>>>> that the
>>>>>>> way Adobe handles this in Flash Media Server is to do like our ogg
>>>>>>> demuxer does and interpret the first page as if each frame is 320
>>>>>>> samples, then resync timestamps with the source after the first  
>>>>>>> page,
>>>>>>> causing a skip in timestamps after the first page instead of at the
>>>>>>> beginning of the stream.
>>>>>>>
>>>>>>> I'm still not sure what to do about this though...
>>>>>> This patch makes it so that all the pts and durations are correct  
>>>>>> for
>>>>>> Ogg/Speex.  It basically just changes the durations of the first and
>>>>>> last packets.
>>>>> nevermind. this doesn't quite work. i'm still working on it. damn ogg
>>>>> and its craziness!
>>>> Ok, now this patch should work correctly.
>>> After some discussion with xiph people, apparently vorbis does this  
>>> exact same thing. The reasoning behind it is that libvorbis/libspeex  
>>> generate additional samples to prime the lapped transform. There is  
>>> apparently nothing in the vorbis/speex bitstream to indicate how many  
>>> samples this is, so instead ogg granulepos is used to figure out how  
>>> many samples to skip at the beginning.
>> Ouch. Is there a way to pre-parse the first page packets before decoding
>> to determine the correct packet durations?
>>
>>> This is probably why there's such a high PSNR difference between our  
>>> decoder and libvorbis despite the output sounding fine. However, I'm  
>>> not sure how to fix this: since vorbis has no fixed frame_size it  
>>> requires decoding every packet in the first ogg page, then subtracting  
>>> the page's granulepos from how many samples were decoded. Which would  
>>> break other containers.
>> Then does ogg/vorbis have the same timestamp problem as ogg/speex when
>> doing stream copy?
>>
>>>> diff --git a/libavformat/oggparsespeex.c b/libavformat/oggparsespeex.c
>>>> index cc00dd2..c295970 100644
>>>> --- a/libavformat/oggparsespeex.c
>>>> +++ b/libavformat/oggparsespeex.c
>>>> @@ -53,6 +64,7 @@ static int speex_header(AVFormatContext *s, int  
>>>> idx) {
>>>>            byte-aligned. */
>>>>         st->codec->frame_size = AV_RL32(p + 56);
>>>>         frames_per_packet     = AV_RL32(p + 64);
>>>> +        spxp->frame_size      = st->codec->frame_size;
>>>>         if (frames_per_packet)
>>>>             st->codec->frame_size *= frames_per_packet;
>>> [...]
>>>
>>>> +static int speex_packet(AVFormatContext *s, int idx)
>>>> +{
>>>> +    struct ogg *ogg = s->priv_data;
>>>> +    struct ogg_stream *os = ogg->streams + idx;
>>>> +    struct speex_params *spxp = os->private;
>>>> +    int frames_per_packet = s->streams[idx]->codec->frame_size /  
>>>> spxp->frame_size;
>>> This frame_size / frame_size is confusing; one should be renamed. spxp- 
>>>  >single_frame_size maybe with a comment reminding that the codec- 
>>>  >frame_size is per packet with multiple frames?
>> I've simplified it now.
>>
>>>> +
>>>> +    if (os->flags & OGG_FLAG_EOS && os->lastgp != -1 && os->granule  
>>>>> 0) {
>>>> +        /* first packet of last page. we have to calculate the last  
>>>> packet
>>>> +           duration here because it is the only place we know the  
>>>> next-to-last
>>>> +           granule position. */
>>>> +        spxp->last_packet_duration = os->granule - os->lastgp +
>>>> +                                     spxp->frame_size *  
>>>> frames_per_packet *
>>>> +                                     (1 - ogg_page_packets(os));
>>>> +    }
>>>> +
>>>> +    if (!os->lastgp && os->granule > 0)
>>> This will set this duration for all speex packets in the first page;  
>>> shouldn't it only be the first (os->seq == 2)?
>>> >From what I can tell, the first packet in a page will have os->lastgp
>> equal to the last granulepos (0 in the case of the first page), and any
>> subsequent packets in the page will have os->lastgp == -1.
>>
>>>> +        /* first packet */
>>>> +        os->pduration = os->granule + spxp->frame_size *  
>>>> frames_per_packet *
>>>> +                        (1 - ogg_page_packets(os));
>>> granule - frame_size * frames_per_packet * (ogg_page_packets(os) - 1)  
>>> and likewise for last_packet_duration is more clear IMO, since it's  
>>> equivalent to (stream duration including packet) - (stream duration  
>>> excluding packet).
>> ok. fixed.
>>
>> new patch attached.
> 
> I updated the patch again.  Since the private context is not used in
> speex_header(), I moved the allocation to speex_packet().

dang. I knew I should've tested more before resending. Here is a new
patch again with the correct calcuation for the last packet duration.

-Justin


-------------- next part --------------
A non-text attachment was scrubbed...
Name: speex_granulepos_delay_5.patch
Type: text/x-diff
Size: 3075 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20091010/a8c322ea/attachment.patch>



More information about the ffmpeg-devel mailing list