[FFmpeg-devel] [PATCH 03/23] avformat/matroskaenc: Use random TrackUID

Michael Niedermayer michael at niedermayer.cc
Wed Nov 27 17:50:41 EET 2019


On Wed, Nov 27, 2019 at 08:25:13AM +0100, Andreas Rheinhardt wrote:
> On Mon, Nov 25, 2019 at 2:57 PM Michael Niedermayer <michael at niedermayer.cc>
> wrote:
> 
> > On Sun, Nov 24, 2019 at 11:05:03PM +0100, Andreas Rheinhardt wrote:
> > > On Sun, Nov 24, 2019 at 8:28 PM Michael Niedermayer
> > <michael at niedermayer.cc>
> > > wrote:
> > >
> > > > On Sun, Nov 24, 2019 at 09:08:00AM +0000, Andreas Rheinhardt wrote:
> > > > > Michael Niedermayer:
> > > > > > On Wed, Nov 06, 2019 at 03:49:02AM +0100, Andreas Rheinhardt wrote:
> > > > > >> Up until now, the TrackUID of a Matroska track which is supposed
> > to be
> > > > > >> random was not random at all: It always coincided with the
> > TrackNumber
> > > > > >> which is usually the 1-based index of the corresponding stream in
> > the
> > > > > >> array of AVStreams. This has been changed: It is now set via an
> > AVLFG
> > > > > >> if AVFMT_FLAG_BITEXACT is not set. Otherwise it is set like it is
> > set
> > > > > >> now (the only change happens if an explicit track number has been
> > > > > >> choosen via dash_track_number, because the system used in the
> > normal
> > > > > >> situation is now used, too). In particular, no FATE tests need to
> > be
> > > > > >> updated.
> > > > > >>
> > > > > >> This also fixes a bug in case the dash_track_number option was
> > used:
> > > > > >> In this case the TrackUID was set to the track number, but the
> > tags
> > > > were
> > > > > >> written with a TagTrackUID simply based upon the index, so that
> > the
> > > > tags
> > > > > >> didn't apply to the track they ought to apply to.
> > > > > >>
> > > > > >> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt at gmail.com>
> > > > > >> ---
> > > > > >> mkv_get_uid() might be overkill, but I simply wanted to be sure
> > that
> > > > > >> there are no collisions.
> > > > > >>
> > > > > >>  libavformat/matroskaenc.c | 65
> > > > ++++++++++++++++++++++++++++++++++-----
> > > > > >>  1 file changed, 57 insertions(+), 8 deletions(-)
> > > > > >>
> > > > > >> diff --git a/libavformat/matroskaenc.c b/libavformat/matroskaenc.c
> > > > > >> index de57e474be..b87d15b7ff 100644
> > > > > >> --- a/libavformat/matroskaenc.c
> > > > > >> +++ b/libavformat/matroskaenc.c
> > > > > >> @@ -94,6 +94,7 @@ typedef struct mkv_cues {
> > > > > >>  typedef struct mkv_track {
> > > > > >>      int             write_dts;
> > > > > >>      int             has_cue;
> > > > > >> +    uint32_t        uid;
> > > > > >>      int             sample_rate;
> > > > > >>      int64_t         sample_rate_offset;
> > > > > >>      int64_t         last_timestamp;
> > > > > >> @@ -1199,8 +1200,7 @@ static int mkv_write_track(AVFormatContext
> > *s,
> > > > MatroskaMuxContext *mkv,
> > > > > >>      track = start_ebml_master(pb, MATROSKA_ID_TRACKENTRY, 0);
> > > > > >>      put_ebml_uint (pb, MATROSKA_ID_TRACKNUMBER,
> > > > > >>                     mkv->is_dash ? mkv->dash_track_number : i +
> > 1);
> > > > > >> -    put_ebml_uint (pb, MATROSKA_ID_TRACKUID,
> > > > > >> -                   mkv->is_dash ? mkv->dash_track_number : i +
> > 1);
> > > > > >> +    put_ebml_uint (pb, MATROSKA_ID_TRACKUID, mkv->tracks[i].uid);
> > > > > >>      put_ebml_uint (pb, MATROSKA_ID_TRACKFLAGLACING , 0);    // no
> > > > lacing (yet)
> > > > > >>
> > > > > >>      if ((tag = av_dict_get(st->metadata, "title", NULL, 0)))
> > > > > >> @@ -1650,7 +1650,8 @@ static int mkv_write_tags(AVFormatContext
> > *s)
> > > > > >>          if (!mkv_check_tag(st->metadata,
> > > > MATROSKA_ID_TAGTARGETS_TRACKUID))
> > > > > >>              continue;
> > > > > >>
> > > > > >> -        ret = mkv_write_tag(s, st->metadata,
> > > > MATROSKA_ID_TAGTARGETS_TRACKUID, i + 1);
> > > > > >> +        ret = mkv_write_tag(s, st->metadata,
> > > > MATROSKA_ID_TAGTARGETS_TRACKUID,
> > > > > >> +                            mkv->tracks[i].uid);
> > > > > >>          if (ret < 0) return ret;
> > > > > >>      }
> > > > > >>
> > > > > >> @@ -1658,13 +1659,15 @@ static int mkv_write_tags(AVFormatContext
> > *s)
> > > > > >>          for (i = 0; i < s->nb_streams; i++) {
> > > > > >>              AVIOContext *pb;
> > > > > >>              AVStream *st = s->streams[i];
> > > > > >> +            mkv_track *track = &mkv->tracks[i];
> > > > > >>              ebml_master tag_target;
> > > > > >>              ebml_master tag;
> > > > > >>
> > > > > >>              if (st->codecpar->codec_type ==
> > AVMEDIA_TYPE_ATTACHMENT)
> > > > > >>                  continue;
> > > > > >>
> > > > > >> -            mkv_write_tag_targets(s,
> > > > MATROSKA_ID_TAGTARGETS_TRACKUID, i + 1, &tag_target);
> > > > > >> +            mkv_write_tag_targets(s,
> > MATROSKA_ID_TAGTARGETS_TRACKUID,
> > > > > >> +                                  track->uid, &tag_target);
> > > > > >>              pb = mkv->tags_bc;
> > > > > >>
> > > > > >>              tag = start_ebml_master(pb, MATROSKA_ID_SIMPLETAG,
> > 0);
> > > > > >> @@ -1863,10 +1866,6 @@ static int
> > mkv_write_header(AVFormatContext *s)
> > > > > >>              version = 4;
> > > > > >>      }
> > > > > >>
> > > > > >> -    mkv->tracks = av_mallocz_array(s->nb_streams,
> > > > sizeof(*mkv->tracks));
> > > > > >> -    if (!mkv->tracks) {
> > > > > >> -        return AVERROR(ENOMEM);
> > > > > >> -    }
> > > > > >>      ebml_header = start_ebml_master(pb, EBML_ID_HEADER,
> > > > MAX_EBML_HEADER_SIZE);
> > > > > >>      put_ebml_uint  (pb, EBML_ID_EBMLVERSION       ,           1);
> > > > > >>      put_ebml_uint  (pb, EBML_ID_EBMLREADVERSION   ,           1);
> > > > > >> @@ -2667,8 +2666,42 @@ static int webm_query_codec(enum AVCodecID
> > > > codec_id, int std_compliance)
> > > > > >>      return 0;
> > > > > >>  }
> > > > > >>
> > > > > >> +static uint32_t mkv_get_uid(const mkv_track *tracks, int i,
> > AVLFG *c)
> > > > > >> +{
> > > > > >> +    uint32_t uid;
> > > > > >> +
> > > > > >> +    for (int j = 0, k; j < 5; j++) {
> > > > > >> +        uid = av_lfg_get(c);
> > > > > >> +        if (!uid)
> > > > > >> +            continue;
> > > > > >> +        for (k = 0; k < i; k++) {
> > > > > >> +            if (tracks[k].uid == uid) {
> > > > > >> +                /* Was something wrong with our random seed? */
> > > > > >> +                av_lfg_init(c, av_get_random_seed());
> > > > > >> +                break;
> > > > > >> +            }
> > > > > >> +        }
> > > > > >> +        if (k == i)
> > > > > >> +            return uid;
> > > > > >> +    }
> > > > > >> +
> > > > > >> +    /* Test the numbers from 1 to i. */
> > > > > >> +    for (int j = 1, k; j < i + 1; j++) {
> > > > > >> +        for (k = 0; k < i; k++) {
> > > > > >> +            if (tracks[k].uid == j)
> > > > > >> +                break;
> > > > > >> +        }
> > > > > >> +        if (k == i)
> > > > > >> +            return j;
> > > > > >> +    }
> > > > > >> +    /* Return i + 1. It can't coincide with another uid. */
> > > > > >> +    return i + 1;
> > > > > >> +}
> > > > > >> +
> > > > > >>  static int mkv_init(struct AVFormatContext *s)
> > > > > >>  {
> > > > > >> +    MatroskaMuxContext *mkv = s->priv_data;
> > > > > >> +    AVLFG c;
> > > > > >>      int i;
> > > > > >>
> > > > > >>      if (s->nb_streams > MAX_TRACKS) {
> > > > > >> @@ -2697,7 +2730,23 @@ static int mkv_init(struct AVFormatContext
> > *s)
> > > > > >>          s->internal->avoid_negative_ts_use_pts = 1;
> > > > > >>      }
> > > > > >>
> > > > > >> +    mkv->tracks = av_mallocz_array(s->nb_streams,
> > > > sizeof(*mkv->tracks));
> > > > > >> +    if (!mkv->tracks) {
> > > > > >> +        return AVERROR(ENOMEM);
> > > > > >> +    }
> > > > > >> +
> > > > > >
> > > > > >> +    if (!(s->flags & AVFMT_FLAG_BITEXACT))
> > > > > >> +        av_lfg_init(&c, av_get_random_seed());
> > > > > >
> > > > > > would there be a disadvantage if the configuration of a stream /
> > its
> > > > content
> > > > > > is used a seed ?
> > > > > > That way it would be more deterministic
> > > > > >
> > > > > I see several disadvantages:
> > > > > 1. If one duplicates a stream (i.e. muxes it twice in the same file),
> > > > > they would get the same uid (unless one takes additional measures).
> > > >
> > > > If you took the configuration of all streams as a single seed that
> > would
> > > > not
> > > > happen
> > > >
> > > >
> > > > > 2. If the packet content is used, one can determine the uid only
> > after
> > > > > having received a packet. This would mean that one has to postpone
> > > > > writing the Tracks element until one has a packet and the same goes
> > > > > for the tags given that the uid is used to convey what they apply to.
> > > > > One could solve this problem in the seekable, non-livestreaming case
> > > > > by reserving size to be overwritten later, but this is nontrivial and
> > > > > I'd like to avoid that.
> > > > > 3. If the configuration record is used, then you won't get anything
> > > > > remotely unique: Using the same encoder settings will likely produce
> > > > > files with identical uids.
> > > > >
> > > > > Furthermore, this approach means that I can reuse the code for both
> > > > > attachments as well as tracks. (See the next patch in this series.)
> > > >
> > > > fair enough, if its considered that avoiding some complexity is more
> > > > important than deterministic output
> > > >
> > >
> > > Just to be sure: The output will be deterministic if the bitexact flag is
> > > set.
> > > I don't see a reason for deterministic output if it is not set.
> >
> > the bitexact flag is used for multiple things
> > 1. produce the same output between platforms and minor code changes
> > 2. produce the same output when running the exact same binary with the
> > exact
> >    same input
> >
> > This has a few problems
> > code pathes disabled because of 1, cannot be tested easily because
> > if bitexact is enabled the code to be tested is disabled
> > if bitexact is disabled random values are put in the file making it not
> > as easy to see if the code under test produces the same results each time
> >
> > Another problem is, obscuring problems, because
> > if with a deterministic binary you get different output on 2 runs then you
> > know you have a bug, a race condition or maybe a uninitialized variable or
> > such.
> > But with random values stored all over this is no longer vissible unless
> > one uses bitexact which most people dont do by default
> > so this could cause a decrease in chance detections of some bugs
> >
> > The Matroska muxer only behaves differently when in bitexact mode in two
> ways: The UIDs are chosen deterministically (or in case of the SegmentUID:
> not written at all) and certain strings that usually contain the
> libavformat version no longer do so. All these things are in the file
> header. There are no (and won't be) random values stored all over the
> place. The code paths are very small and so I don't think that they will
> obscure any bug.

If a user encodes the same input twice with the same options and
as its popular uses mkv as output and then notices that the files
are different then with deterministic components she can conclude
there was a bug. With randomly seeded UIDs that conclusion is not
possible anymore.
Retesting with bitexact in this case also doesnt help, the bug
might not occur during 2 bitexact retests

Iam not saying thats enough of a problem to change something but what
i mean is that its a real thing not have absolutely 0 disadvantage.


[...]
> >
> > [...]
> > >
> > >
> > > > and, this
> > > > +    /* Test the numbers from 1 to i. */
> > > > +    for (int j = 1, k; j < i + 1; j++) {
> > > > +        for (k = 0; k < i; k++) {
> > > > +            if (tracks[k].uid == j)
> > > > +                break;
> > > > +        }
> > > > +        if (k == i)
> > > > +            return j;
> > > > +    }
> > > >
> > > > just smells "wrong"
> > > >
> > > > This code would only be executed if using AVLFG would repeatedly lead
> > to
> > > collisions;
> >
> > yes but this doesnt smell right
> > if your random number generator produces no random numbers i dont think
> > adding +1 is the solution.
> > I mean if that really happens, theres some bug either in LFG the seed
> > generation
> > or in how it is used.
> >
> > Yes, this code was only intended to be run if the random number generator
> is completely buggy. But it seems that this can unfortunately not ruled out
> completely (

> https://www.phoronix.com/scan.php?page=news_item&px=AMD-RdRand-Disable-15h-16h
> ).

The fix to this does not belong into a caller of av_get_random_seed()
if thats fixed / worked around in CPU, Bios, OS, or in av_get_random_seed()
i dont know but definitly not the user of av_get_random_seed().
If av_get_random_seed() is bad thats bad and needs to be fixed

Also lets assume av_get_random_seed() is bad and returns -1 always
or another constant.
The code still must not fail like this. the seeded PRNG must still 
function properly and produce a few differentg values.
If its statistics are 100% duplicates thats a failure of the PRNG.

[...]

Thanks

-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Into a blind darkness they enter who follow after the Ignorance,
they as if into a greater darkness enter who devote themselves
to the Knowledge alone. -- Isha Upanishad
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: not available
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20191127/9a7a3346/attachment.sig>


More information about the ffmpeg-devel mailing list