[FFmpeg-devel] [PATCH] avformat/matroskaenc: Allow changing the time stamp precision via option

Tue Jun 15 02:04:36 EEST 2021

On Mon, Jun 14, 2021 at 12:09 PM Andreas Rheinhardt <
andreas.rheinhardt at outlook.com> wrote:

> Thierry Foucu:
> > On Fri, Jun 4, 2021 at 10:18 PM Andreas Rheinhardt <
> > andreas.rheinhardt at outlook.com> wrote:
> >
> >> Michael Fabian 'Xaymar' Dirks:
> >>> On 2021-05-24 22:15, Andreas Rheinhardt wrote:
> >>>> michael.dirks at xaymar.com:
> >>>>> From: Michael Fabian 'Xaymar' Dirks <michael.dirks at xaymar.com>
> >>>>>
> >>>>> Adds "timestamp_precision" to the available options for Matroska
> >> muxing.
> >>>>> The option enables users and developers to change the precision of
> the
> >>>>> time stamps in the Matroska container up to 1 nanosecond, which can
> aid
> >>>>> with the proper detection of constant and variable rate content.
> >>>>>
> >>>>> Work-around fix for: 259, 6406, 7927, 8909 and 9124.
> >>>>>
> >>>>> Signed-off-by: Michael Fabian 'Xaymar' Dirks <
> michael.dirks at xaymar.com
> >>>
> >>>>> ---
> >>>>>   doc/muxers.texi           |  8 ++++++++
> >>>>>   libavformat/matroskaenc.c | 33 ++++++++++++++++++++++++++-------
> >>>>>   2 files changed, 34 insertions(+), 7 deletions(-)
> >>>>>
> >>>>> diff --git a/doc/muxers.texi b/doc/muxers.texi
> >>>>> index e1c6ad0829..8655be94ff 100644
> >>>>> --- a/doc/muxers.texi
> >>>>> +++ b/doc/muxers.texi
> >>>>> @@ -1583,6 +1583,14 @@ bitmap is stored bottom-up. Note that this
> >>>>> option does not flip the bitmap
> >>>>>   which has to be done manually beforehand, e.g. by using the vflip
> >>>>> filter.
> >>>>>   Default is @var{false} and indicates bitmap is stored top down.
> >>>>>   + at item timestamp_precision
> >>>>> +Sets the timestamp precision up to 1 nanosecond for Matroska/WebM,
> >>>>> which can
> >>>>> +improve detection of constant rate content in demuxers. Note that
> >>>>> some poorly
> >>>>> +implemented demuxers may require a timestamp precision of 1
> >>>>> millisecond, so
> >>>>> +increasing it past that point may result in playback issues. Higher
> >>>>> precision
> >>>>> +also reduces the maximum possible timestamp significantly.
> >>>> This should mention that a too high precision will increase container
> >>>> overhead.
> >>> Good point.
> >>>>
> >>>>> +Default is @var{1/1000} (1 millisecond).
> >>>>> +
> >>>>>   @end table
> >>>>>     @anchor{md5}
> >>>>> diff --git a/libavformat/matroskaenc.c b/libavformat/matroskaenc.c
> >>>>> index 186a25d920..1b911a648c 100644
> >>>>> --- a/libavformat/matroskaenc.c
> >>>>> +++ b/libavformat/matroskaenc.c
> >>>>> @@ -158,6 +158,8 @@ typedef struct MatroskaMuxContext {
> >>>>>       int                 default_mode;
> >>>>>         uint32_t            segment_uid[4];
> >>>>> +
> >>>>> +    AVRational          time_base;
> >>>>>   } MatroskaMuxContext;
> >>>>>     /** 2 bytes * 7 for EBML IDs, 7 1-byte EBML lengths, 6 1-byte
> uint,
> >>>>> @@ -1814,6 +1816,7 @@ static int mkv_write_header(AVFormatContext *s)
> >>>>>       const AVDictionaryEntry *tag;
> >>>>>       int ret, i, version = 2;
> >>>>>       int64_t creation_time;
> >>>>> +    int64_t time_base = 1;
> >>>>>         if (mkv->mode != MODE_WEBM ||
> >>>>>           av_dict_get(s->metadata, "stereo_mode", NULL, 0) ||
> >>>>> @@ -1850,7 +1853,10 @@ static int mkv_write_header(AVFormatContext
> *s)
> >>>>>           return ret;
> >>>>>       pb = mkv->info.bc;
> >>>>>   -    put_ebml_uint(pb, MATROSKA_ID_TIMECODESCALE, 1000000);
> >>>>> +    time_base = av_rescale_q(time_base, mkv->time_base,
> >>>>> (AVRational){1, 1000000000});
> >>>>> +    av_log(s, AV_LOG_DEBUG, "TimestampScale is: %" PRId64 " ns\n",
> >>>>> time_base);
> >>>>> +    put_ebml_uint(pb, MATROSKA_ID_TIMECODESCALE, time_base);
> >>>> There is a serious problem here: mkv->time_base is the time base that
> >>>> the muxer uses; yet if mkv->time_base is not a multiple of
> 1/1000000000,
> >>>> then av_rescale_q will have to round a bit and then the demuxer will
> >>>> read a different time base, leading to a drift of all timestamps. This
> >>>> is not acceptable.
> >>> This issue is already present in the current version of FFmpeg.
> >>
> >> Matroska's timestamp imprecision currently lead to jitter, but not to
> >> drift (this of course presumes that one actually uses the timestamps
> >> as-is (instead of summing the durations)). But your patch as-is can lead
> >> to drift (when using a wrong timebase), because it adds a whole new type
> >> of error: One in which the demuxer and the muxer disagree about the
> >> timebase that is in use. Consider a user using 1/750000000 as timebase.
> >> 48kHz audio (with a timebase of 1/48000) can be converted accurately to
> >> it, so that the muxer receives precise timestamps. Yet the muxer
> >> actually writes that the timebase is 1/1000000000 and that is what a
> >> demuxer sees. This means that all timestamps (except those for which
> >> Matroska uses "absolute timestamps" (Matroska-speak for a time/duration
> >> that is always in 1/1000000000 like default duration)) are skewed as-if
> >> multiplied by 3/4.
> >>
> >>> Unfortunately even if you tell me this, this is not something I can
> >>> change: Matroska uses nanosecond precision, and nobody has agreed on
> >>> what the way forward for timestamps is. You will have to bring up such
> >>> nanosecond-precision problems with the Matroska specification
> >>> maintainers itself, which are currently seeking IETF standardization:
> >>> https://github.com/ietf-wg-cellar/matroska-specification/issues/422
> >>>>
> >>
> >> Already did so (I am mkver).
> >>
> >>>>> +
> >>>>>       if ((tag = av_dict_get(s->metadata, "title", NULL, 0)))
> >>>>>           put_ebml_string(pb, MATROSKA_ID_TITLE, tag->value);
> >>>>>       if (!(s->flags & AVFMT_FLAG_BITEXACT)) {
> >>>>> @@ -1883,11 +1889,11 @@ static int mkv_write_header(AVFormatContext
> *s)
> >>>>>           int64_t metadata_duration = get_metadata_duration(s);
> >>>>>             if (s->duration > 0) {
> >>>>> -            int64_t scaledDuration = av_rescale(s->duration, 1000,
> >>>>> AV_TIME_BASE);
> >>>>> +            int64_t scaledDuration = av_rescale_q(s->duration,
> >>>>> AV_TIME_BASE_Q, mkv->time_base);
> >>>>>               put_ebml_float(pb, MATROSKA_ID_DURATION,
> scaledDuration);
> >>>>>               av_log(s, AV_LOG_DEBUG, "Write early duration from
> >>>>> recording time = %" PRIu64 "\n", scaledDuration);
> >>>>>           } else if (metadata_duration > 0) {
> >>>>> -            int64_t scaledDuration = av_rescale(metadata_duration,
> >>>>> 1000, AV_TIME_BASE);
> >>>>> +            int64_t scaledDuration = av_rescale_q(metadata_duration,
> >>>>> AV_TIME_BASE_Q, mkv->time_base);
> >>>>>               put_ebml_float(pb, MATROSKA_ID_DURATION,
> scaledDuration);
> >>>>>               av_log(s, AV_LOG_DEBUG, "Write early duration from
> >>>>> metadata = %" PRIu64 "\n", scaledDuration);
> >>>>>           } else if (s->pb->seekable & AVIO_SEEKABLE_NORMAL) {
> >>>>> @@ -1948,12 +1954,12 @@ static int mkv_write_header(AVFormatContext
> *s)
> >>>>>       // after 4k and on a keyframe
> >>>>>       if (IS_SEEKABLE(pb, mkv)) {
> >>>>>           if (mkv->cluster_time_limit < 0)
> >>>>> -            mkv->cluster_time_limit = 5000;
> >>>>> +            mkv->cluster_time_limit = av_rescale_q(5000,
> >>>>> (AVRational){1, 1000}, mkv->time_base);
> >>>>>           if (mkv->cluster_size_limit < 0)
> >>>>>               mkv->cluster_size_limit = 5 * 1024 * 1024;
> >>>>>       } else {
> >>>>>           if (mkv->cluster_time_limit < 0)
> >>>>> -            mkv->cluster_time_limit = 1000;
> >>>>> +            mkv->cluster_time_limit = av_rescale_q(1000,
> >>>>> (AVRational){1, 1000}, mkv->time_base);
> >>>> There are three places where you rescale this; it would be better if
> you
> >>>> did it unconditionally in one place (namely after this block here).
> >>>>
> >>> I did not want to cut into existing code too much, merely adjust it to
> >>> the change without changing the main parts.
> >>>>>           if (mkv->cluster_size_limit < 0)
> >>>>>               mkv->cluster_size_limit = 32 * 1024;
> >>>>>       }
> >>>>> @@ -2713,6 +2719,14 @@ static int mkv_init(struct AVFormatContext *s)
> >>>>>       } else
> >>>>>           mkv->mode = MODE_MATROSKAv2;
> >>>>>   +    // WebM requires a timestamp precision of 1ms.
> >>>>> +    if (mkv->mode == MODE_WEBM) {
> >>>>> +        if (av_cmp_q(mkv->time_base, (AVRational){1, 1000}) != 0) {
> >>>>> +            av_log(s, AV_LOG_ERROR, "WebM requires 1ms timestamp
> >>>>> precision\n");
> >>>>> +            return AVERROR(EINVAL);
> >>>>> +        }
> >>>> (This could be put into the same block as the one just above that sets
> >>>> the mode.)
> >>> Fair point.
> >>>>> +    }
> >>>>> +
> >>>>>       mkv->cur_audio_pkt = av_packet_alloc();
> >>>>>       if (!mkv->cur_audio_pkt)
> >>>>>           return AVERROR(ENOMEM);
> >>>>> @@ -2738,8 +2752,8 @@ static int mkv_init(struct AVFormatContext *s)
> >>>>>               track->uid = mkv_get_uid(mkv->tracks, i, &c);
> >>>>>           }
> >>>>>   -        // ms precision is the de-facto standard timescale for mkv
> >>>>> files
> >>>>> -        avpriv_set_pts_info(st, 64, 1, 1000);
> >>>>> +        // Use user-defined timescale.
> >>>>> +        avpriv_set_pts_info(st, 64, mkv->time_base.num,
> >>>>> mkv->time_base.den);
> >>>>>             if (st->codecpar->codec_type == AVMEDIA_TYPE_ATTACHMENT)
> {
> >>>>>               if (mkv->mode == MODE_WEBM) {
> >>>>> @@ -2759,6 +2773,10 @@ static int mkv_init(struct AVFormatContext *s)
> >>>>>           track->track_num_size = ebml_num_size(track->track_num);
> >>>>>       }
> >>>>>   +    // Scale the configured cluster_time_limit.
> >>>>> +    if (mkv->cluster_time_limit >= 0)
> >>>>> +        mkv->cluster_time_limit =
> >>>>> av_rescale_q(mkv->cluster_time_limit, (AVRational){1, 1000},
> >>>>> mkv->time_base);
> >>>>> +
> >>>>>       if (mkv->is_dash && nb_tracks != 1)
> >>>>>           return AVERROR(EINVAL);
> >>>>>   @@ -2826,6 +2844,7 @@ static const AVOption options[] = {
> >>>>>       { "infer", "For each track type, mark the first track of
> >>>>> disposition default as default; if none exists, mark the first track
> >>>>> as default.", 0, AV_OPT_TYPE_CONST, { .i64 = DEFAULT_MODE_INFER }, 0,
> >>>>> 0, FLAGS, "default_mode" },
> >>>>>       { "infer_no_subs", "For each track type, mark the first track
> >>>>> of disposition default as default; for audio and video: if none
> >>>>> exists, mark the first track as default.", 0, AV_OPT_TYPE_CONST, {
> >>>>> .i64 = DEFAULT_MODE_INFER_NO_SUBS }, 0, 0, FLAGS, "default_mode" },
> >>>>>       { "passthrough", "Use the disposition flag as-is", 0,
> >>>>> AV_OPT_TYPE_CONST, { .i64 = DEFAULT_MODE_PASSTHROUGH }, 0, 0, FLAGS,
> >>>>> "default_mode" },
> >>>>> +    { "timestamp_precision", "Timestamp precision to use for the
> >>>>> entire container", OFFSET(time_base), AV_OPT_TYPE_RATIONAL, { .dbl =
> >>>>> 0.001 }, 0.000000001, INT_MAX, FLAGS},
> >>>> Using a default value that is a valid value makes it impossible to
> >>>> distinguish whether the user set a value at all; but I'd like to
> >>>> preserve that possibility.
> >>> Sorry, could you explain why you want to do this further? I see no code
> >>> path that would even need to check if the user/caller has set
> precision,
> >>> if it's left at default we get the default of 1/1000 - just as previous
> >>> FFmpeg versions.
> >>>>
> >>
> >> The default value of 1/1000 is good for most use-cases, but there might
> >> be scenarios where it is not (e.g. TrueHD packets have a duration < 1ms,
> >> so different packets will end up with the same timestamp when using
> >> 1/1000). If the user has set a timestamp precision, then that should be
> >> honoured; but if not, then we might use a different value than 1/1000
> >> (in the future; I know that currently no code path that makes use of
> >> this exists).
> >>
> >>
> > Is there any problem to submit this flag while keeping the default value
> to
> > millisecond. So users of matroska muxer who want to increase the
> precision
> > can still do it.
> > And the discussion of which default value should be could be done in
> > another patch?
> >
> There is no such problem in general; yet there is a problem with this
> patch, namely that it allows to choose timebases which are not a
> multiple of the basic Matroska timebase of 1/1000000000. See above.
>

Thanks. So, if the option is taking a int64 to represent the timescale,
instead of a floating point (so no need to convert from it), just having
the option setting will be ok with the community?

>
> - Andreas
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request at ffmpeg.org with subject "unsubscribe".
>

-- 

Thierry Foucu