[FFmpeg-user] Why are PTS values different from what's expected?

Thu Apr 1 19:33:09 EEST 2021

On 2021-04-01 11:41, pdr0 wrote:
> Mark Filipak (ffmpeg) wrote
>> On 2021-04-01 07:13, Mark Filipak (ffmpeg) wrote:
>>> The source is MKV. MKV has a 1/1000 TB, so any PTS variance should be
>>> less than 0.1%.
>>>
>>> The filter complex is thinned down to just this: settb=1/720000,showinfo
>>>
>>> Here is selected lines from the showinfo report (with   ...comments):
>>>
>>> [Parsed_showinfo_1 @ 00000247d719ef00] config in time_base: 1/720000,
>>> frame_rate: 24000/1001
>>>      ...So, deltaPTS (calculated: 1/TB/FR) should be 30030.
>>> [Parsed_showinfo_1 @ 00000247d719ef00] n:   1 pts:  30240   ...should be
>>> 30030
>>> [Parsed_showinfo_1 @ 00000247d719ef00] n:   2 pts:  59760   ...should be
>>> 60060
>>> [Parsed_showinfo_1 @ 00000247d719ef00] n:   3 pts:  90000   ...should be
>>> 90090
>>> [Parsed_showinfo_1 @ 00000247d719ef00] n:   4 pts: 120240   ...should be
>>> 120120
>>>
>>> The PTS variance is 0.7%.
>>>
>>> Why are PTS values different from what's expected?
>>>
>>> Note: If I force deltaPTS via setpts=N*30030, then of course I get what's
>>> expected.
>>>
>>> Thanks. This is critical and your explanation is greatly appreciated!
>>> Mark.
>>
>> UPDATE
>>
>> If I change the filter complex to this:
>>
>> settb=1/720000,setpts=N*30030,fps=fps=48000/1001,showinfo
>>
>> all my follow-on processing goes straight into the toilet.
>>
>> Explanation of the factors in the filter complex:
>> settb=1/720000   ...mandate 1.3[8..] ms time resolution
>> setpts=N*30030   ...force the input to exactly 24000/1001fps cfr
>> fps=fps=48000/1001   ...frame double
>>
>> However, fps=fps=48000/1001 does more than just frame double. It resets TB
>> to 20.8541[6..] ms time
>> resolution. Look:
>>
>> [Parsed_showinfo_3 @ 000001413bf0ef00] config in time_base: 1001/48000,
>> frame_rate: 48000/1001
>> [Parsed_showinfo_3 @ 000001413bf0ef00] n:   0 pts:      0
>> [Parsed_showinfo_3 @ 000001413bf0ef00] n:   1 pts:      1
>> [Parsed_showinfo_3 @ 000001413bf0ef00] n:   2 pts:      2
>> [Parsed_showinfo_3 @ 000001413bf0ef00] n:   3 pts:      3
>>
>> Gee, I wish the fps filter documention said that it changes TB and sets
>> deltaPTS to '1'.
>>
>> My follow-on frame processing can't tolerate 20.8541[6..] ms time
>> resolution -- that explains why my
>> mechanical frame gynmastics have been failing!
>>
>> Explanation: My follow-on processing does fractional frame adjustment that
>> requires at least
>> 8.341[6..] ms resolution.
>>
>> Workaround: I can frame double by another method that's somewhat ugly but
>> that I know works and
>> doesn't trash time resolution.
> 
> Did you try changing the order? ie. -vf fps first ?

Before the 'settb=1/720000,setpts=N*30030'? That wouldn't be appropriate because I need to guarantee 
that the input is forced to 24000/1001fps cfr, first. Only then will fps=fps=48000/1001 actually 
double each frame without dropping any -- without such assurance, if any particular frame happens to 
have a PTS that's 'faster' than 24000/1001fps, then the shift to 48000/1001fps would drop it because 
the fps filter works solely at the frame level.

What I'm trying to do is make a 120000/1001fps cfr in which each frame is a proportionally weighted 
pixel mix of the 24 picture-per-second original:
AAAAA AAAAB AAABB AABBB ABBBB.
I'm sure it would be way better than standard telecine -- zero judder -- and I'm pretty sure it 
would be so close to motion vector interpolation that any difference would be imperceptible. I'm 
also sure that it would be a much faster process than mvinterpolate. The only question would be 
resulting file size (though I think they would be very close).

ffmpeg minterpolate is very slow (2+ days for a 2 hour movie) but it induces some fairly distracting 
twinkling artifacts. Piping InterFrame to ffmpeg (for finishing and encoding) is much better but 
even slower (3+ days for a 2 hour movie). My trials indicate that a 5-level pixel interpolation 
would process a 2 hour movie in 6 hours.

If I were to write code, of course buffering, flipping around, and combining frames would be easy. 
Using what ffmpeg gives me involves 'buffering' via secondary streams in a filter complex with the 
constraint that each of the secondary streams are (and remain) completely unified and continuous 
streams (i.e. are not individual frames 'hanging in time'). It's an unfortunate complication. The 
complication would be somewhat alleviated if ffmpeg had an actual multiplexer driven by a 
programmable sequencer but it doesn't.