[FFmpeg-devel] [PATCH] rtpdec: Emit timestamps for packets before the first RTCP packet, too

Thu Dec 30 10:03:59 CET 2010

Hi Martin,

On 28 December 2010 14:22, Martin Storsj? <martin at martin.st> wrote:
> Hi Josh,
>
> On Mon, 27 Dec 2010, Josh Allmann wrote:
>
>> On 27 December 2010 00:48, Martin Storsjo <martin at martin.st> wrote:
>> > Timestamps in each stream start from 0, for the first received
>> > RTP packet. Once an RTCP packet is received, that one is used for
>> > sync, emitting timestamps that fit seamlessly into the earlier ones.
>>
>> Just a small comment: RTP timestamps do not necessarily start from
>> zero (RFC 3550, section 5.1: "The initial value of the timestamp
>> SHOULD be random") and the RTCP wallclock is used for syncing between
>> streams that otherwise have different starting timestamps.
>
> Yes, yes of course :-)
>
>> I am not familiar enough with that particular piece of code to make
>> judgements about the patch, though.
>
> Verbally, here's what the code does, after my patch:
>
> Prior to the first RTCP packet, the timestamp returned is [RTP timestamp]
> - [RTP timestamp of first packet], so regardless of at what value they
> start, the ones we emit start at 0. The RTP timestamp of the first packet
> is named base_timestamp in the code.
>
> Once the other patchset for modifying the header parsing is applied, we
> could parse the RTP-Info header, too, and use the timestamp specified
> there instead of the RTP timestamp of the first packet.
>
> When we get the first RTCP packet, we calculate the offset from the first
> RTCP packet to the base RTP timestamp, and store this in rtcp_ts_offset.
> At this point, timestamps emitted are: [RTP timestamp] - [RTP timestamp of
> last RTCP] + [diff between latest RTCP packet and first RTCP packet] +
> [rtcp_ts_offset]. Proper rescaling between values expressed in different
> units is done, of course.
>
> Thus, all streams are synced together via the NTP timestamps once an RTCP
> packet has been received in that stream, before that, the timestamps are
> simple diffs against the first packet.
>
> Actually, on top of all this, we add a variable named range_start_offset.
> This is used for emitting sensible timestamps after seeking. If we seek to
> e.g. 42.0, and the response to the PLAY header had a Range: 42.0- header,
> we add this on top of all timestamps, so that the emitted timestamps start
> at 42.
>
>
> A full example might be useful:
>
> We start playing with a seek to 42.0. We don't get any RTCP packets
> initially. We have both an audio and video stream, both having the
> timebase 1000 for simplicity.
>
> We receive video packets with timestamps 1000, 1100, 1200. The first
> packet gives base_timestamp 1000. The diff to the initial timestamp thus
> is 0.0, 0.1, 0.2, and we add range_start_offset 42.0 so we return 42.0,
> 42.1, 42.2. Similarly, for the audio stream, we get packets with the
> timestamps 5000, 5100, 5200. Thus, base_timestamp for this stream is 5000,
> and we return the summed timestamps 42.0, 42.1, 42.2.
>
> After one second, we receive a RTCP packet in the video stream, but none
> in the audio stream. This RTCP packet has the NTP timestamp 501 seconds
> and RTP timestamp 2000. The diff in RTP timestamp units to base_timestamp
> is 1000, 1.0 seconds, stored in rtcp_ts_offset. We set first_rtcp_ntp_time
> and last_rtcp_ntp_time to 501, last_rtcp_timestamp to 2000. Following RTP
> packets with RTP timestamps 2100, 2200 and 2300 get the timestamps 43.1,
> 43.2 and 43.3 like this: range_start_offset (42.0) + rtcp_ts_offset (1.0)
> + addend (0, diff between last_rtcp_ntp_time and first_rtcp_ntp_time) +
> delta_timestamp (0.1, 0.2, 0.3, the diff between last_rtcp_timestamp and
> the RTP timestamps).
>
> A while later, we get another RTCP packet, with the NTP timestamp 502 and
> RTP timestamp 3000. Following packets with RTP timestamps 3100 etc get
> their timestamps like this: range_start_offset (42.0) + rtcp_ts_offset
> (1.0) + addend (1.0, last_rtcp_ntp_time - first_rtcp_ntp_time) +
> delta_timestamp (0.1).
>
> When we got the first RTCP packet, the values for that stream are
> propagated to all other streams, namely first_rtcp_ntp_time and
> rtcp_ts_offset. Since we haven't gotten any RTCP packets in the audio
> stream (last_rtcp_ntp_time isn't set), the RTCP-less calculation is still
> used.
>
> A bit later, we get the first RTCP packet for the audio stream, with NTP
> time 504 seconds, RTP timestamp 9000. An audio packet with RTP timestamp
> 9100 gets its final timestamp calculated like this: 4.1 =
> range_start_offset (42.0) + rtcp_ts_offset (1.0, propagated from the
> stream with the first RTCP packet) + addend (3.0, 504 - 501, where 501 was
> propagated from the stream with the first RTCP packet) + delta (0.1, 9100
> - 9000)
>

Much clearer now, indeed.

I wasn't sure if the start-from-zero behavior was a baked in
assumption but it doesn't appear that way. If not only to handle
seeking, now that I think of it.

Nice patch, especially with wrapping your head around the intended
behavior, and thanks for explaining it.

Josh