[FFmpeg-user] PTS resolution[s]

Mark Filipak (ffmpeg) markfilipak at bog.us
Tue Feb 23 19:51:51 EET 2021


On 2021-02-23 03:58, Jim DeLaHunt wrote:
> On 2021-02-22 21:35, Mark Filipak (ffmpeg) wrote:
> 
>> On 2021-02-23 00:01, Jim DeLaHunt wrote:
>>> The Presentation Time Stamp (PTS) value which FFmpeg associates with video frames and audio data 
>>> is a 64-bit integer. There is an associated time base attribute for each video or audio stream, 
>>> which gives the number of seconds between successive values of PTS. This time base might be 
>>> thought of as the resolution of PTS. Thus if you have two PTS values pts1 and pts2, then the 
>>> difference in seconds between them is (pts2-pts1)*time_base.
>>
>>
>> MPEG PES (Presentation Elemental Stream) uses a 27MHz (exact) clock divided by 300 (exact), so 
>> that timebase is 1/(90000Hz)… >
> I've read something similar. My understanding is that MPEG PES encodes Presentation Time Stamp 
> values as integer tick counts in the data stream. Is the timebase of 1/(90,000Hz) encoded in the 
> data stream, or it is only defined in the spec?

The decoder is *assumed* to have a 90KHz timebase. Otherwise, the decode will not be correct. The 
idea of manipulating the timebase is a creation of programmers. The timebase is hardware: a 27MHz 
oscillator, followed by more hardware: a by-300 divider, followed by more hardware: a 33-bit counter 
that produces a 33-bit binary number that's called PTS. The 33-bit counter is zeroed at the 
beginning of the stream and is copied out at the beginning of each frame. Making the timebase 
something different is like saying, "Let's make the wall clock's second a half a second." Doing that 
doesn't make actual time speed up except maybe in "THE MATRIX".

>> …(which is 0.01[1..]ms between ticks, exactly).

> Actually, for this discussion I think it's fair to say that 0.01[1..]ms is not exactly 1/90 ms, it 
> is just an approximation. ...

No, it's exactly 1/(90000Hz) == 0.011[1..] milliseconds.

>... Finite decimal numbers will never get you the exact value. ...

The 90KHz is as accurate as the 27MHz oscillator that drives it. A 27MHz crystal oscillator 
oscillates (1 cycle) every 37.037[037..] nanoseconds. To give you an idea of how little it jitters, 
consider this: The receiving decoder has its own 27MHz oscillator that is synchronized to the 
encoder's recorded 27MHz oscillator count just 10 times per second -- *that* number (i.e. the count 
at which synchronization occurs) is called "PCR". Between synchronization times, the decoder's 
oscillator free runs. That it's sync'ed every 1/10th second means that it free runs for 2.7 million 
oscillations -- in other words, it free runs for 2699999 parts out of 2700000 parts (i.e. 
+/-0.00001.85[185..] percent). During that free run, it drifts a bit based on the Quality Factor and 
cut of the crystal, the crystal trimming, the temperature, air pressure, the phase of the moon 
...just kidding on that last one (or maybe not).

>... The rational 
> number is exact. For this discussion, it will be clearer to use exact rational numbers.

Compared to the silken accuracy of a phase locked crystal oscillator system, your rational numbers 
are stone hammers.

>> …my best information so far is that, at least out of the encoder, ffmpeg encodes frames with PTS 
>> resolution = 1ms.
> 
> My impression from reading the FPS filter source code is that it is incomplete to talk about ffmpeg 
> PTS values without also giving the corresponding timebase value. ...

Except in "THE MATRIX", the timebase is 90KHz +/-0.00001.85%. Changing the timebase to something 
else is a programming construct and is not real. That number should not have been called a time 
base. It should have been called a time base divider. Thinking that by changing that number the 
programmers are changing the actual time base is delusional. What they're doing is introducing 
fractional errors in their calculations that occur when their "time base" isn't a whole multiple of 
the real time base. You know, just because you can poke numbers into a 'C' struct doesn't mean those 
numbers are real.

>... It looks to me like the FPS filter 
> does not attempt to preserve the incoming PTS values or timebase. ...

Yes, that's what 'fps=<a number>' does. But remember, changing the actual time base is something 
that's solely in "THE MATRIX".

Let me explain it like this:
What's the difference between 30fps and 30/1.001fps?
The difference is:
for 30fps, the current time base count is updated every 33.33[3..]ms, while
for 30/1.001fps, it is updated every 33.36[6..]ms.
(Note that the difference between them is smaller than 1ms.)
For a 90KHz time base, those numbers (i.e. PTSs) are,
for 30fps, '3000', and
for 30/1.001fps, '3003'.
Those are the only real numbers that matter.
In other words, the real time base (90KHz) and the real PTS differences ('3000' vs. '3003') are 
producing effects that ffmpeg's bogus time base (1KHz) can't resolve because it can't 'see' a 
difference of less than 1ms (i.e. the difference between 30fps and 30/1.001fps).

Do you 'get it'? I know it's hard, but that's the reality of life outside "THE MATRIX".

>... It sets a new time base of 
> 1/frame_rate, and generates successive integer values for PTS. However, and this is crucial, it does 
> seem to value being exact about the value of PTS*time_base.
> 
> So, that seems to say that your statement "at least out of the encoder, ffmpeg encodes frames with 
> PTS resolution = 1ms" is not complete without stating the time base value ffmpeg sets out of the 
> encoder.

Software cannot change the real time base.

>> To put this into perspective, a 24fps video has delta-PTS = 41.[6..]ms whereas a 24/1.001fps video 
>> has delta-PTS = 41.708[3..]milliseconds. That means that the difference between the two is less 
>> than the resolution of the ffmpeg timebase (at least, for the encoder -- I don't know about the 
>> decoder and the pipeline). That essentially means that ffmpeg can't differentiate between them 
>> based on the working PTSs that it keeps.
> 
> But what are the time base values which ffmpeg uses for these two cases?  If the time base is 1/24 
> in the first case, and 1,001/24,000 in the second case, then the same integer PTS values result in 
> PTS*time_base products being exactly the correct time offsets from the first frame of the video in 
> each of the two cases.

First, remember this:
For 30fps, delta-PTS = '3000', and
For 30/1.001fps, delta-PTS = '3003'.
Both are whole numbers.

For 24fps, delta-PTS = '3750'.
For 24/1.001fps, delta-PTS would be '3753.75' except that PTS must be a whole number, so I'd say 
that 24/1.001fps is a bogus frame rate that exists solely in "THE MATRIX", not in real life.
So what happens if PTS is forced to '3754' to produce pseudo 24/1.001fps? I'd say the answer is: 
playback and/or processing errors.

ASIDE: Frankly, the unreality of 24/1.001fps comes as a surprise to me. I had assumed (until a 
couple of lines back) that 24/1.001fps was a real possibility in real video systems. I even have it 
in my working documentation, i.e. that 24/1.001fps plays movies 0.1% slow, analogous to 24fps played 
back at 25fps (PAL TV) being 4% fast (which I know is real). The difference is that a 4% PAL sped up 
is real but a 0.1% slow down isn't real. The reason I didn't realize that 24/1.001fps is not real is 
that, until a few lines back, I'd never run the numbers on it. I will correct my working 
documentation to reflect that fact.

Yes, I know I can do this: '-vf fps=24000/1001' but that doesn't make it real.

Bottom line: The number or fraction that's poked into an 'fps' filter has to be real. Otherwise, 
there will be problems. The number or fraction that's poked into an 'fps' filter has to be from a 
limited set of real frame rates that fit the MPEG encoder-decoder model that's in the H.262 
specification and that produce integer delta-PTSs.

The added requirement is that the ffmpeg decoder and pipeline have the resolution need to reproduce 
the MPEG encoder-decoder model -- and that ain't 1ms, it's 0.011[1..]ms.

>> I seek someone who can either, 1, confirm what I think, or 2, tell me what the resolution of the 
>> decoder and pipeline actually is.

I think I've shown that the ffmpeg decoder and pipeline probably has higher resolution than 1ms, but 
what it actually is remains a mystery.

> Implicit in your use of the definite article "the" is an apparent assumption that FFmpeg has only 
> one resolution for the decoder and the pipeline. It looks to me like FFmpeg could well take the 
> liberty of changing resolution at each stage of decoder and pipeline, as long as it preserves the 
> values for PTS*time_base at each frame (or modifies them intentionally, as the FPS filter does).

Your observation is sound. What's important is how fine the pipeline timing can be set, not how many 
digits PTS contains. From a general engineering standpoint, ffmpeg should use a time base that's 
0.011[1..]ms or finer. Delta-PTS resolution that's less than 0.011[1..]ms will absolutely result in 
playback and/or processing errors. How you find delta-PTS resolution in the source code is unknown.

> Best regards,
>       —Jim DeLaHunt
> 
> _______________________________________________
> ffmpeg-user mailing list
> ffmpeg-user at ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-user
> 
> To unsubscribe, visit link above, or email
> ffmpeg-user-request at ffmpeg.org with subject "unsubscribe".


-- 
In the 1970s, a year at Ohio State Univ = 1 month of minimum wage earnings.
In the 2020s, a year at Ohio State Univ = 1+ year of minimum wage earnings.
In the 1970s, most jobs were manufacturing, corporate taxes were fair.
In the 2020s, most jobs are service, corporate taxes are nearly nonexistent.
The U.S. standard of living has plummeted; the wealth gap is now a canyon.
In the future, robots will supply the ultimate in slave labor.
The coming crisis is here. Beam me up, Scotty!


More information about the ffmpeg-user mailing list