[MPlayer-G2-dev] mpeg container's timing (PTS values)

Thu May 8 01:41:23 CEST 2003

Hi,

I've "developed" a new a-v sync engine in g2 code, which produces A-V: 0.0000
for most mpeg1/vob streams i have.

The video part is relative easy, but a bit tricky: when a PS packet has a
PTS timestamp, that timestamp belongs to the next complete frame.
(not to the one which ends in that packet!)

The audio is however very tricky.
The old (used in mplayer-g1) audio pts calculation method assumed, that the
timestamps received from the demuxer belongs to the first byte of that packet.
So, after decoded an audio frame/block, it increased PTS by the compressed
frame size divided by compressed byterate.
It is very inaccurate for mpeg. Now i've found why: in mpeg containers, the
audio timestamps behave like the video: they belong to the _next_ complete
frame/block. As AC3 frames are big, they usually go accross multiple
packets, this error may be big.
But fixing this, I got stable A-V but non-zero ct (correction total).
After experimencng with several streams, i've found that ct: value is the
time length of an audio frame. Strange, isn't it?
It means, that the PTS doesn't even belongs to the next audio frame, but to
the next after the next. Or in other words: it's the timestamp for the last
byte/sample of the next frame, instead of the first one:

                                               v-- the PTS belongs to the
                                              PTS  end of f2 (or start of f3)
                                               v
frames:          [.....f1......][......f2......][.....f3......]
packets:           |   p1   |   p2   |  p3   |   p4  |  p5  |  
                               ^PTS^
                                 ^- the PTS is coded in p2's header

Using this logic I got <5ms ct: times for all mpeg streams!

But quoting the mpeg-system.pdf:
"
In the T-STD in figure 2-6 on page 11 the display of a video presentation
unit (a picture) occurs instantaneously at its presentation time, tp n (k).

In the T_STD the output of an audio presentation unit starts at its presentation
time, tp n (k), when the decoder instantaneously presents the first sample.
Subsequent samples in the presentation unit are presented in sequence at the audio sampling rate.
"

It suggests that PTS if the beginning of that audio block, not the end!

Do anyone have accurate info about the meaning/calculation of audio PTS for
mpeg container?

A'rpi / Astral & ESP-team

--
Developer of MPlayer, the Movie Player for Linux - http://www.MPlayerHQ.hu