[FFmpeg-devel] [PATCH] A/V Synchronization in the RTP demuxer

Luca Abeni lucabe72
Mon Jun 23 08:28:42 CEST 2008

Hi Michael,

Michael Niedermayer wrote:
>>                  int delta_timestamp;
>> -                /* XXX: is it really necessary to unify the timestamp base ? */
>>                  /* compute pts from timestamp with received ntp_time */
>>                  delta_timestamp = timestamp - s->last_rtcp_timestamp;
>> -                /* convert to 90 kHz without overflow */
>> -                addend = (s->last_rtcp_ntp_time - s->first_rtcp_ntp_time) >> 14;
>> -                addend = (addend * 5625) >> 14;
>> +        /* convert to the PTS timebase */
>> +        addend = av_rescale_q(s->last_rtcp_ntp_time - s->first_rtcp_ntp_time, AV_TIME_BASE_Q, s->st->time_base);
> This looks wrong, last_rtcp_ntp_time is being read from somewhere with
> AV_RB64() while AV_TIME_BASE_Q is a lav* specific constant and can hardly
> match the timebase of what is read from somewhere.

last_rtcp_ntp_time is an NTP time, expressed in usecs, and I need to convert
it in time_base units. I saw that AV_TIME_BASE_Q corresponds to 1us, so
I thought I could use it... I did not know that it is a lav* specific constant,
Is it ok if I change this line in
addend = av_rescale_q(s->last_rtcp_ntp_time - s->first_rtcp_ntp_time, (AVRational){1, 1000000), s->st->time_base);
(BTW, I notice now that the RTP muxer has the same problem... I'll fix it
as soon as I know if the proposal above is ok, or if I should use a #define,
or there is some better way).

>> -        case CODEC_ID_AAC:
>> -        case CODEC_ID_H264:
>> -        case CODEC_ID_MPEG4:
>> -            pkt->pts = timestamp;
> so this was wrong?

Yes (see the explanation below). It generated video timestamps that were not
comparable with the audio timestamps (and vice versa).

> and s->last_rtcp_ntp_time != AV_NOPTS_VALUE is required
> now if one wants timestamps?

s->last_rtcp_ntp_time != AV_NOPTS_VALUE means that at least an RTCP Sender
Report (SR) packet has been received. In order to synchronise audio and
video, such a packet is needed (see below). Anyway, in case of RTSP an SR
packet is received immediately (so, this is not an issue). In case of
multicast "broadcast like" streams, an SR packet will arrive in about 5
seconds. Some video players (such as vlc) starts playing with wrong
timestamps (unsynched audio and video), and "correct" this fact when the
first SR arrives (so, when you receive multicast RTP you can see a "jump"
in the video after few seconds). Some other players buffer packets until
an SR packet arrives.
Anyway, yes, I believe that if s->last_rtcp_ntp_time == AV_NOPTS_VALUE
we should not return a timestamp. But I am willing to change this if
people want.

> no i dunno rtp*.c but i think your explanations are a little terse
> i mean what is wrong, why is it wrong and how does your fix improve it?

Opss... Sorry about that. I realize now that my email was a little bit
cryptic (I was leaving for the weekend, and I realized that I still had
some patches to send... So I did not spend too much time with explanations.
Note to myself: never send patches when I am in hurry ;-)

Anyway, here is an informal description of the situation:
RTP timestamps from different streams are not directly comparable, because
they start from different offsets. So, if I directly get the timestamps
from (for example) an MPEG4 stream and an AAC stream, I cannot synchronise
audio and video (unless the server uses the same initial offset for the two
streams... But this can only happen by accident, since the RFC requires to
use random offsets). This is what currently happens.
To get proper timestamps which allow to synchronise the audio and video

stream, I must receive an RTCP SR packet from the server. Such packet
contains an NTP time and the corresponding RTP timestamp. Using these
values, it is possible to convert RTP timestamps in presentation times,
which have the same offset for all the streams, so that audio presentation
times are comparable with the video presentation times. This is whay the
finalize_packet() function tries to do... Unfortunately, it only converts
the timestamps in case of MPEG{1,2} video and mp{2,3} audio. Do not ask me
why it works like this (probably, historic reasons?), but to me this
behaviour is wrong.
So, my patch fixes the problem by converting the RTP timestamp for all
the payload types, and not only for MPEG (the additional changes are due
to the fact that different payloads use different time bases... The current
function hardcodes a 1/90k time_base, and I made it generic).


More information about the ffmpeg-devel mailing list