[FFmpeg-devel] [PATCH] avformat/matroskaenc: Allow changing the time stamp precision via option

Sat Jun 5 08:17:45 EEST 2021

Michael Fabian 'Xaymar' Dirks:
> On 2021-05-24 22:15, Andreas Rheinhardt wrote:
>> michael.dirks at xaymar.com:
>>> From: Michael Fabian 'Xaymar' Dirks <michael.dirks at xaymar.com>
>>>
>>> Adds "timestamp_precision" to the available options for Matroska muxing.
>>> The option enables users and developers to change the precision of the
>>> time stamps in the Matroska container up to 1 nanosecond, which can aid
>>> with the proper detection of constant and variable rate content.
>>>
>>> Work-around fix for: 259, 6406, 7927, 8909 and 9124.
>>>
>>> Signed-off-by: Michael Fabian 'Xaymar' Dirks <michael.dirks at xaymar.com>
>>> ---
>>>   doc/muxers.texi           |  8 ++++++++
>>>   libavformat/matroskaenc.c | 33 ++++++++++++++++++++++++++-------
>>>   2 files changed, 34 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/doc/muxers.texi b/doc/muxers.texi
>>> index e1c6ad0829..8655be94ff 100644
>>> --- a/doc/muxers.texi
>>> +++ b/doc/muxers.texi
>>> @@ -1583,6 +1583,14 @@ bitmap is stored bottom-up. Note that this
>>> option does not flip the bitmap
>>>   which has to be done manually beforehand, e.g. by using the vflip
>>> filter.
>>>   Default is @var{false} and indicates bitmap is stored top down.
>>>   + at item timestamp_precision
>>> +Sets the timestamp precision up to 1 nanosecond for Matroska/WebM,
>>> which can
>>> +improve detection of constant rate content in demuxers. Note that
>>> some poorly
>>> +implemented demuxers may require a timestamp precision of 1
>>> millisecond, so
>>> +increasing it past that point may result in playback issues. Higher
>>> precision
>>> +also reduces the maximum possible timestamp significantly.
>> This should mention that a too high precision will increase container
>> overhead.
> Good point.
>>
>>> +Default is @var{1/1000} (1 millisecond).
>>> +
>>>   @end table
>>>     @anchor{md5}
>>> diff --git a/libavformat/matroskaenc.c b/libavformat/matroskaenc.c
>>> index 186a25d920..1b911a648c 100644
>>> --- a/libavformat/matroskaenc.c
>>> +++ b/libavformat/matroskaenc.c
>>> @@ -158,6 +158,8 @@ typedef struct MatroskaMuxContext {
>>>       int                 default_mode;
>>>         uint32_t            segment_uid[4];
>>> +
>>> +    AVRational          time_base;
>>>   } MatroskaMuxContext;
>>>     /** 2 bytes * 7 for EBML IDs, 7 1-byte EBML lengths, 6 1-byte uint,
>>> @@ -1814,6 +1816,7 @@ static int mkv_write_header(AVFormatContext *s)
>>>       const AVDictionaryEntry *tag;
>>>       int ret, i, version = 2;
>>>       int64_t creation_time;
>>> +    int64_t time_base = 1;
>>>         if (mkv->mode != MODE_WEBM ||
>>>           av_dict_get(s->metadata, "stereo_mode", NULL, 0) ||
>>> @@ -1850,7 +1853,10 @@ static int mkv_write_header(AVFormatContext *s)
>>>           return ret;
>>>       pb = mkv->info.bc;
>>>   -    put_ebml_uint(pb, MATROSKA_ID_TIMECODESCALE, 1000000);
>>> +    time_base = av_rescale_q(time_base, mkv->time_base,
>>> (AVRational){1, 1000000000});
>>> +    av_log(s, AV_LOG_DEBUG, "TimestampScale is: %" PRId64 " ns\n",
>>> time_base);
>>> +    put_ebml_uint(pb, MATROSKA_ID_TIMECODESCALE, time_base);
>> There is a serious problem here: mkv->time_base is the time base that
>> the muxer uses; yet if mkv->time_base is not a multiple of 1/1000000000,
>> then av_rescale_q will have to round a bit and then the demuxer will
>> read a different time base, leading to a drift of all timestamps. This
>> is not acceptable.
> This issue is already present in the current version of FFmpeg.

Matroska's timestamp imprecision currently lead to jitter, but not to
drift (this of course presumes that one actually uses the timestamps
as-is (instead of summing the durations)). But your patch as-is can lead
to drift (when using a wrong timebase), because it adds a whole new type
of error: One in which the demuxer and the muxer disagree about the
timebase that is in use. Consider a user using 1/750000000 as timebase.
48kHz audio (with a timebase of 1/48000) can be converted accurately to
it, so that the muxer receives precise timestamps. Yet the muxer
actually writes that the timebase is 1/1000000000 and that is what a
demuxer sees. This means that all timestamps (except those for which
Matroska uses "absolute timestamps" (Matroska-speak for a time/duration
that is always in 1/1000000000 like default duration)) are skewed as-if
multiplied by 3/4.

> Unfortunately even if you tell me this, this is not something I can
> change: Matroska uses nanosecond precision, and nobody has agreed on
> what the way forward for timestamps is. You will have to bring up such
> nanosecond-precision problems with the Matroska specification
> maintainers itself, which are currently seeking IETF standardization:
> https://github.com/ietf-wg-cellar/matroska-specification/issues/422
>>

Already did so (I am mkver).

>>> +
>>>       if ((tag = av_dict_get(s->metadata, "title", NULL, 0)))
>>>           put_ebml_string(pb, MATROSKA_ID_TITLE, tag->value);
>>>       if (!(s->flags & AVFMT_FLAG_BITEXACT)) {
>>> @@ -1883,11 +1889,11 @@ static int mkv_write_header(AVFormatContext *s)
>>>           int64_t metadata_duration = get_metadata_duration(s);
>>>             if (s->duration > 0) {
>>> -            int64_t scaledDuration = av_rescale(s->duration, 1000,
>>> AV_TIME_BASE);
>>> +            int64_t scaledDuration = av_rescale_q(s->duration,
>>> AV_TIME_BASE_Q, mkv->time_base);
>>>               put_ebml_float(pb, MATROSKA_ID_DURATION, scaledDuration);
>>>               av_log(s, AV_LOG_DEBUG, "Write early duration from
>>> recording time = %" PRIu64 "\n", scaledDuration);
>>>           } else if (metadata_duration > 0) {
>>> -            int64_t scaledDuration = av_rescale(metadata_duration,
>>> 1000, AV_TIME_BASE);
>>> +            int64_t scaledDuration = av_rescale_q(metadata_duration,
>>> AV_TIME_BASE_Q, mkv->time_base);
>>>               put_ebml_float(pb, MATROSKA_ID_DURATION, scaledDuration);
>>>               av_log(s, AV_LOG_DEBUG, "Write early duration from
>>> metadata = %" PRIu64 "\n", scaledDuration);
>>>           } else if (s->pb->seekable & AVIO_SEEKABLE_NORMAL) {
>>> @@ -1948,12 +1954,12 @@ static int mkv_write_header(AVFormatContext *s)
>>>       // after 4k and on a keyframe
>>>       if (IS_SEEKABLE(pb, mkv)) {
>>>           if (mkv->cluster_time_limit < 0)
>>> -            mkv->cluster_time_limit = 5000;
>>> +            mkv->cluster_time_limit = av_rescale_q(5000,
>>> (AVRational){1, 1000}, mkv->time_base);
>>>           if (mkv->cluster_size_limit < 0)
>>>               mkv->cluster_size_limit = 5 * 1024 * 1024;
>>>       } else {
>>>           if (mkv->cluster_time_limit < 0)
>>> -            mkv->cluster_time_limit = 1000;
>>> +            mkv->cluster_time_limit = av_rescale_q(1000,
>>> (AVRational){1, 1000}, mkv->time_base);
>> There are three places where you rescale this; it would be better if you
>> did it unconditionally in one place (namely after this block here).
>>
> I did not want to cut into existing code too much, merely adjust it to
> the change without changing the main parts.
>>>           if (mkv->cluster_size_limit < 0)
>>>               mkv->cluster_size_limit = 32 * 1024;
>>>       }
>>> @@ -2713,6 +2719,14 @@ static int mkv_init(struct AVFormatContext *s)
>>>       } else
>>>           mkv->mode = MODE_MATROSKAv2;
>>>   +    // WebM requires a timestamp precision of 1ms.
>>> +    if (mkv->mode == MODE_WEBM) {
>>> +        if (av_cmp_q(mkv->time_base, (AVRational){1, 1000}) != 0) {
>>> +            av_log(s, AV_LOG_ERROR, "WebM requires 1ms timestamp
>>> precision\n");
>>> +            return AVERROR(EINVAL);
>>> +        }
>> (This could be put into the same block as the one just above that sets
>> the mode.)
> Fair point.
>>> +    }
>>> +
>>>       mkv->cur_audio_pkt = av_packet_alloc();
>>>       if (!mkv->cur_audio_pkt)
>>>           return AVERROR(ENOMEM);
>>> @@ -2738,8 +2752,8 @@ static int mkv_init(struct AVFormatContext *s)
>>>               track->uid = mkv_get_uid(mkv->tracks, i, &c);
>>>           }
>>>   -        // ms precision is the de-facto standard timescale for mkv
>>> files
>>> -        avpriv_set_pts_info(st, 64, 1, 1000);
>>> +        // Use user-defined timescale.
>>> +        avpriv_set_pts_info(st, 64, mkv->time_base.num,
>>> mkv->time_base.den);
>>>             if (st->codecpar->codec_type == AVMEDIA_TYPE_ATTACHMENT) {
>>>               if (mkv->mode == MODE_WEBM) {
>>> @@ -2759,6 +2773,10 @@ static int mkv_init(struct AVFormatContext *s)
>>>           track->track_num_size = ebml_num_size(track->track_num);
>>>       }
>>>   +    // Scale the configured cluster_time_limit.
>>> +    if (mkv->cluster_time_limit >= 0)
>>> +        mkv->cluster_time_limit =
>>> av_rescale_q(mkv->cluster_time_limit, (AVRational){1, 1000},
>>> mkv->time_base);
>>> +
>>>       if (mkv->is_dash && nb_tracks != 1)
>>>           return AVERROR(EINVAL);
>>>   @@ -2826,6 +2844,7 @@ static const AVOption options[] = {
>>>       { "infer", "For each track type, mark the first track of
>>> disposition default as default; if none exists, mark the first track
>>> as default.", 0, AV_OPT_TYPE_CONST, { .i64 = DEFAULT_MODE_INFER }, 0,
>>> 0, FLAGS, "default_mode" },
>>>       { "infer_no_subs", "For each track type, mark the first track
>>> of disposition default as default; for audio and video: if none
>>> exists, mark the first track as default.", 0, AV_OPT_TYPE_CONST, {
>>> .i64 = DEFAULT_MODE_INFER_NO_SUBS }, 0, 0, FLAGS, "default_mode" },
>>>       { "passthrough", "Use the disposition flag as-is", 0,
>>> AV_OPT_TYPE_CONST, { .i64 = DEFAULT_MODE_PASSTHROUGH }, 0, 0, FLAGS,
>>> "default_mode" },
>>> +    { "timestamp_precision", "Timestamp precision to use for the
>>> entire container", OFFSET(time_base), AV_OPT_TYPE_RATIONAL, { .dbl =
>>> 0.001 }, 0.000000001, INT_MAX, FLAGS},
>> Using a default value that is a valid value makes it impossible to
>> distinguish whether the user set a value at all; but I'd like to
>> preserve that possibility.
> Sorry, could you explain why you want to do this further? I see no code
> path that would even need to check if the user/caller has set precision,
> if it's left at default we get the default of 1/1000 - just as previous
> FFmpeg versions.
>>

The default value of 1/1000 is good for most use-cases, but there might
be scenarios where it is not (e.g. TrueHD packets have a duration < 1ms,
so different packets will end up with the same timestamp when using
1/1000). If the user has set a timestamp precision, then that should be
honoured; but if not, then we might use a different value than 1/1000
(in the future; I know that currently no code path that makes use of
this exists).

- Andreas