[FFmpeg-devel] [PATCH 03/23] avformat/matroskaenc: Use random TrackUID

Andreas Rheinhardt andreas.rheinhardt at gmail.com
Mon Nov 25 00:05:03 EET 2019


On Sun, Nov 24, 2019 at 8:28 PM Michael Niedermayer <michael at niedermayer.cc>
wrote:

> On Sun, Nov 24, 2019 at 09:08:00AM +0000, Andreas Rheinhardt wrote:
> > Michael Niedermayer:
> > > On Wed, Nov 06, 2019 at 03:49:02AM +0100, Andreas Rheinhardt wrote:
> > >> Up until now, the TrackUID of a Matroska track which is supposed to be
> > >> random was not random at all: It always coincided with the TrackNumber
> > >> which is usually the 1-based index of the corresponding stream in the
> > >> array of AVStreams. This has been changed: It is now set via an AVLFG
> > >> if AVFMT_FLAG_BITEXACT is not set. Otherwise it is set like it is set
> > >> now (the only change happens if an explicit track number has been
> > >> choosen via dash_track_number, because the system used in the normal
> > >> situation is now used, too). In particular, no FATE tests need to be
> > >> updated.
> > >>
> > >> This also fixes a bug in case the dash_track_number option was used:
> > >> In this case the TrackUID was set to the track number, but the tags
> were
> > >> written with a TagTrackUID simply based upon the index, so that the
> tags
> > >> didn't apply to the track they ought to apply to.
> > >>
> > >> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt at gmail.com>
> > >> ---
> > >> mkv_get_uid() might be overkill, but I simply wanted to be sure that
> > >> there are no collisions.
> > >>
> > >>  libavformat/matroskaenc.c | 65
> ++++++++++++++++++++++++++++++++++-----
> > >>  1 file changed, 57 insertions(+), 8 deletions(-)
> > >>
> > >> diff --git a/libavformat/matroskaenc.c b/libavformat/matroskaenc.c
> > >> index de57e474be..b87d15b7ff 100644
> > >> --- a/libavformat/matroskaenc.c
> > >> +++ b/libavformat/matroskaenc.c
> > >> @@ -94,6 +94,7 @@ typedef struct mkv_cues {
> > >>  typedef struct mkv_track {
> > >>      int             write_dts;
> > >>      int             has_cue;
> > >> +    uint32_t        uid;
> > >>      int             sample_rate;
> > >>      int64_t         sample_rate_offset;
> > >>      int64_t         last_timestamp;
> > >> @@ -1199,8 +1200,7 @@ static int mkv_write_track(AVFormatContext *s,
> MatroskaMuxContext *mkv,
> > >>      track = start_ebml_master(pb, MATROSKA_ID_TRACKENTRY, 0);
> > >>      put_ebml_uint (pb, MATROSKA_ID_TRACKNUMBER,
> > >>                     mkv->is_dash ? mkv->dash_track_number : i + 1);
> > >> -    put_ebml_uint (pb, MATROSKA_ID_TRACKUID,
> > >> -                   mkv->is_dash ? mkv->dash_track_number : i + 1);
> > >> +    put_ebml_uint (pb, MATROSKA_ID_TRACKUID, mkv->tracks[i].uid);
> > >>      put_ebml_uint (pb, MATROSKA_ID_TRACKFLAGLACING , 0);    // no
> lacing (yet)
> > >>
> > >>      if ((tag = av_dict_get(st->metadata, "title", NULL, 0)))
> > >> @@ -1650,7 +1650,8 @@ static int mkv_write_tags(AVFormatContext *s)
> > >>          if (!mkv_check_tag(st->metadata,
> MATROSKA_ID_TAGTARGETS_TRACKUID))
> > >>              continue;
> > >>
> > >> -        ret = mkv_write_tag(s, st->metadata,
> MATROSKA_ID_TAGTARGETS_TRACKUID, i + 1);
> > >> +        ret = mkv_write_tag(s, st->metadata,
> MATROSKA_ID_TAGTARGETS_TRACKUID,
> > >> +                            mkv->tracks[i].uid);
> > >>          if (ret < 0) return ret;
> > >>      }
> > >>
> > >> @@ -1658,13 +1659,15 @@ static int mkv_write_tags(AVFormatContext *s)
> > >>          for (i = 0; i < s->nb_streams; i++) {
> > >>              AVIOContext *pb;
> > >>              AVStream *st = s->streams[i];
> > >> +            mkv_track *track = &mkv->tracks[i];
> > >>              ebml_master tag_target;
> > >>              ebml_master tag;
> > >>
> > >>              if (st->codecpar->codec_type == AVMEDIA_TYPE_ATTACHMENT)
> > >>                  continue;
> > >>
> > >> -            mkv_write_tag_targets(s,
> MATROSKA_ID_TAGTARGETS_TRACKUID, i + 1, &tag_target);
> > >> +            mkv_write_tag_targets(s, MATROSKA_ID_TAGTARGETS_TRACKUID,
> > >> +                                  track->uid, &tag_target);
> > >>              pb = mkv->tags_bc;
> > >>
> > >>              tag = start_ebml_master(pb, MATROSKA_ID_SIMPLETAG, 0);
> > >> @@ -1863,10 +1866,6 @@ static int mkv_write_header(AVFormatContext *s)
> > >>              version = 4;
> > >>      }
> > >>
> > >> -    mkv->tracks = av_mallocz_array(s->nb_streams,
> sizeof(*mkv->tracks));
> > >> -    if (!mkv->tracks) {
> > >> -        return AVERROR(ENOMEM);
> > >> -    }
> > >>      ebml_header = start_ebml_master(pb, EBML_ID_HEADER,
> MAX_EBML_HEADER_SIZE);
> > >>      put_ebml_uint  (pb, EBML_ID_EBMLVERSION       ,           1);
> > >>      put_ebml_uint  (pb, EBML_ID_EBMLREADVERSION   ,           1);
> > >> @@ -2667,8 +2666,42 @@ static int webm_query_codec(enum AVCodecID
> codec_id, int std_compliance)
> > >>      return 0;
> > >>  }
> > >>
> > >> +static uint32_t mkv_get_uid(const mkv_track *tracks, int i, AVLFG *c)
> > >> +{
> > >> +    uint32_t uid;
> > >> +
> > >> +    for (int j = 0, k; j < 5; j++) {
> > >> +        uid = av_lfg_get(c);
> > >> +        if (!uid)
> > >> +            continue;
> > >> +        for (k = 0; k < i; k++) {
> > >> +            if (tracks[k].uid == uid) {
> > >> +                /* Was something wrong with our random seed? */
> > >> +                av_lfg_init(c, av_get_random_seed());
> > >> +                break;
> > >> +            }
> > >> +        }
> > >> +        if (k == i)
> > >> +            return uid;
> > >> +    }
> > >> +
> > >> +    /* Test the numbers from 1 to i. */
> > >> +    for (int j = 1, k; j < i + 1; j++) {
> > >> +        for (k = 0; k < i; k++) {
> > >> +            if (tracks[k].uid == j)
> > >> +                break;
> > >> +        }
> > >> +        if (k == i)
> > >> +            return j;
> > >> +    }
> > >> +    /* Return i + 1. It can't coincide with another uid. */
> > >> +    return i + 1;
> > >> +}
> > >> +
> > >>  static int mkv_init(struct AVFormatContext *s)
> > >>  {
> > >> +    MatroskaMuxContext *mkv = s->priv_data;
> > >> +    AVLFG c;
> > >>      int i;
> > >>
> > >>      if (s->nb_streams > MAX_TRACKS) {
> > >> @@ -2697,7 +2730,23 @@ static int mkv_init(struct AVFormatContext *s)
> > >>          s->internal->avoid_negative_ts_use_pts = 1;
> > >>      }
> > >>
> > >> +    mkv->tracks = av_mallocz_array(s->nb_streams,
> sizeof(*mkv->tracks));
> > >> +    if (!mkv->tracks) {
> > >> +        return AVERROR(ENOMEM);
> > >> +    }
> > >> +
> > >
> > >> +    if (!(s->flags & AVFMT_FLAG_BITEXACT))
> > >> +        av_lfg_init(&c, av_get_random_seed());
> > >
> > > would there be a disadvantage if the configuration of a stream / its
> content
> > > is used a seed ?
> > > That way it would be more deterministic
> > >
> > I see several disadvantages:
> > 1. If one duplicates a stream (i.e. muxes it twice in the same file),
> > they would get the same uid (unless one takes additional measures).
>
> If you took the configuration of all streams as a single seed that would
> not
> happen
>
>
> > 2. If the packet content is used, one can determine the uid only after
> > having received a packet. This would mean that one has to postpone
> > writing the Tracks element until one has a packet and the same goes
> > for the tags given that the uid is used to convey what they apply to.
> > One could solve this problem in the seekable, non-livestreaming case
> > by reserving size to be overwritten later, but this is nontrivial and
> > I'd like to avoid that.
> > 3. If the configuration record is used, then you won't get anything
> > remotely unique: Using the same encoder settings will likely produce
> > files with identical uids.
> >
> > Furthermore, this approach means that I can reuse the code for both
> > attachments as well as tracks. (See the next patch in this series.)
>
> fair enough, if its considered that avoiding some complexity is more
> important than deterministic output
>

Just to be sure: The output will be deterministic if the bitexact flag is
set.
I don't see a reason for deterministic output if it is not set.

>
> Either way if you need X bits of entropy in the muxer there should be
> X/32 calls to av_get_random_seed(). as it is there are 4 independant
> places calling av_get_random_seed() after the patch each using 32bit
> to initialize an independant LFG.
>

After the next patch, there will only be three; and usually only two of them
are used (the one in mkv_get_uid is usually not used).


> seed collisions are not at all impossible at 32bits
> also av_get_random_seed() can be expensive on some platforms so
> i think there should be a single place calling it as needed and not
> throwing the obtained entropy away.
>

I didn't deem this important performance-wise, as this is only done a few
times during init.
But if you want to, I can reuse the AVLFG from mkv_init for the segment UID.


>
> Also, why 32bit ? arent these "uint" UIDs 64bit ?
>
> Most UIDs (except the Segment UIDs) are 64bit. The reason for 32bit is
that this patchset was already pretty long, so I didn't include a patch to
make these 64bit. (There are some issues with the chapter UIDs: The
Matroska demuxer implicitly only uses the lower 32bits without making sure
that these numbers are distinct; and in an edge case the written chapter ID
can be zero despite this being against the spec: Namely if one of the
chapter IDs is INT_MIN and another one is INT_MAX.)


> and, this
> +    /* Test the numbers from 1 to i. */
> +    for (int j = 1, k; j < i + 1; j++) {
> +        for (k = 0; k < i; k++) {
> +            if (tracks[k].uid == j)
> +                break;
> +        }
> +        if (k == i)
> +            return j;
> +    }
>
> just smells "wrong"
>
> This code would only be executed if using AVLFG would repeatedly lead to
collisions;
it is just there to guarantee that the UIDs are different, but actually it
is not intended to be run.


> if you want a simple non repeating UID, just count 0,1,2,3 for each
> and put that in the unused upper 32bit for example
>
> or you can even ditch LFG and use something that findamentgally generates
> non repeating output.
> UID[i] = seed + i
> comes to mind
> or if theres a reason why that is not good then you can also do something
> like
> UID[i] = (seed + i*(seed2|1)) & 0xFFFFFFFFFFFFFFFF;
>
> but maybe theres a reason behind the more complex UID generation that
> iam missing ?
>
>  The reason I've chosen this is that this is what has already been used
for the UIDs of attachments, so I simply wanted to reuse this, while also
fixing its defects (namely that it does not ensure that the resulting UIDs
are actually distinct).

- Andreas


More information about the ffmpeg-devel mailing list