[FFmpeg-devel] [PATCH 03/23] avformat/matroskaenc: Use random TrackUID
Andreas Rheinhardt
andreas.rheinhardt at gmail.com
Mon Nov 25 00:05:03 EET 2019
On Sun, Nov 24, 2019 at 8:28 PM Michael Niedermayer <michael at niedermayer.cc>
wrote:
> On Sun, Nov 24, 2019 at 09:08:00AM +0000, Andreas Rheinhardt wrote:
> > Michael Niedermayer:
> > > On Wed, Nov 06, 2019 at 03:49:02AM +0100, Andreas Rheinhardt wrote:
> > >> Up until now, the TrackUID of a Matroska track which is supposed to be
> > >> random was not random at all: It always coincided with the TrackNumber
> > >> which is usually the 1-based index of the corresponding stream in the
> > >> array of AVStreams. This has been changed: It is now set via an AVLFG
> > >> if AVFMT_FLAG_BITEXACT is not set. Otherwise it is set like it is set
> > >> now (the only change happens if an explicit track number has been
> > >> choosen via dash_track_number, because the system used in the normal
> > >> situation is now used, too). In particular, no FATE tests need to be
> > >> updated.
> > >>
> > >> This also fixes a bug in case the dash_track_number option was used:
> > >> In this case the TrackUID was set to the track number, but the tags
> were
> > >> written with a TagTrackUID simply based upon the index, so that the
> tags
> > >> didn't apply to the track they ought to apply to.
> > >>
> > >> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt at gmail.com>
> > >> ---
> > >> mkv_get_uid() might be overkill, but I simply wanted to be sure that
> > >> there are no collisions.
> > >>
> > >> libavformat/matroskaenc.c | 65
> ++++++++++++++++++++++++++++++++++-----
> > >> 1 file changed, 57 insertions(+), 8 deletions(-)
> > >>
> > >> diff --git a/libavformat/matroskaenc.c b/libavformat/matroskaenc.c
> > >> index de57e474be..b87d15b7ff 100644
> > >> --- a/libavformat/matroskaenc.c
> > >> +++ b/libavformat/matroskaenc.c
> > >> @@ -94,6 +94,7 @@ typedef struct mkv_cues {
> > >> typedef struct mkv_track {
> > >> int write_dts;
> > >> int has_cue;
> > >> + uint32_t uid;
> > >> int sample_rate;
> > >> int64_t sample_rate_offset;
> > >> int64_t last_timestamp;
> > >> @@ -1199,8 +1200,7 @@ static int mkv_write_track(AVFormatContext *s,
> MatroskaMuxContext *mkv,
> > >> track = start_ebml_master(pb, MATROSKA_ID_TRACKENTRY, 0);
> > >> put_ebml_uint (pb, MATROSKA_ID_TRACKNUMBER,
> > >> mkv->is_dash ? mkv->dash_track_number : i + 1);
> > >> - put_ebml_uint (pb, MATROSKA_ID_TRACKUID,
> > >> - mkv->is_dash ? mkv->dash_track_number : i + 1);
> > >> + put_ebml_uint (pb, MATROSKA_ID_TRACKUID, mkv->tracks[i].uid);
> > >> put_ebml_uint (pb, MATROSKA_ID_TRACKFLAGLACING , 0); // no
> lacing (yet)
> > >>
> > >> if ((tag = av_dict_get(st->metadata, "title", NULL, 0)))
> > >> @@ -1650,7 +1650,8 @@ static int mkv_write_tags(AVFormatContext *s)
> > >> if (!mkv_check_tag(st->metadata,
> MATROSKA_ID_TAGTARGETS_TRACKUID))
> > >> continue;
> > >>
> > >> - ret = mkv_write_tag(s, st->metadata,
> MATROSKA_ID_TAGTARGETS_TRACKUID, i + 1);
> > >> + ret = mkv_write_tag(s, st->metadata,
> MATROSKA_ID_TAGTARGETS_TRACKUID,
> > >> + mkv->tracks[i].uid);
> > >> if (ret < 0) return ret;
> > >> }
> > >>
> > >> @@ -1658,13 +1659,15 @@ static int mkv_write_tags(AVFormatContext *s)
> > >> for (i = 0; i < s->nb_streams; i++) {
> > >> AVIOContext *pb;
> > >> AVStream *st = s->streams[i];
> > >> + mkv_track *track = &mkv->tracks[i];
> > >> ebml_master tag_target;
> > >> ebml_master tag;
> > >>
> > >> if (st->codecpar->codec_type == AVMEDIA_TYPE_ATTACHMENT)
> > >> continue;
> > >>
> > >> - mkv_write_tag_targets(s,
> MATROSKA_ID_TAGTARGETS_TRACKUID, i + 1, &tag_target);
> > >> + mkv_write_tag_targets(s, MATROSKA_ID_TAGTARGETS_TRACKUID,
> > >> + track->uid, &tag_target);
> > >> pb = mkv->tags_bc;
> > >>
> > >> tag = start_ebml_master(pb, MATROSKA_ID_SIMPLETAG, 0);
> > >> @@ -1863,10 +1866,6 @@ static int mkv_write_header(AVFormatContext *s)
> > >> version = 4;
> > >> }
> > >>
> > >> - mkv->tracks = av_mallocz_array(s->nb_streams,
> sizeof(*mkv->tracks));
> > >> - if (!mkv->tracks) {
> > >> - return AVERROR(ENOMEM);
> > >> - }
> > >> ebml_header = start_ebml_master(pb, EBML_ID_HEADER,
> MAX_EBML_HEADER_SIZE);
> > >> put_ebml_uint (pb, EBML_ID_EBMLVERSION , 1);
> > >> put_ebml_uint (pb, EBML_ID_EBMLREADVERSION , 1);
> > >> @@ -2667,8 +2666,42 @@ static int webm_query_codec(enum AVCodecID
> codec_id, int std_compliance)
> > >> return 0;
> > >> }
> > >>
> > >> +static uint32_t mkv_get_uid(const mkv_track *tracks, int i, AVLFG *c)
> > >> +{
> > >> + uint32_t uid;
> > >> +
> > >> + for (int j = 0, k; j < 5; j++) {
> > >> + uid = av_lfg_get(c);
> > >> + if (!uid)
> > >> + continue;
> > >> + for (k = 0; k < i; k++) {
> > >> + if (tracks[k].uid == uid) {
> > >> + /* Was something wrong with our random seed? */
> > >> + av_lfg_init(c, av_get_random_seed());
> > >> + break;
> > >> + }
> > >> + }
> > >> + if (k == i)
> > >> + return uid;
> > >> + }
> > >> +
> > >> + /* Test the numbers from 1 to i. */
> > >> + for (int j = 1, k; j < i + 1; j++) {
> > >> + for (k = 0; k < i; k++) {
> > >> + if (tracks[k].uid == j)
> > >> + break;
> > >> + }
> > >> + if (k == i)
> > >> + return j;
> > >> + }
> > >> + /* Return i + 1. It can't coincide with another uid. */
> > >> + return i + 1;
> > >> +}
> > >> +
> > >> static int mkv_init(struct AVFormatContext *s)
> > >> {
> > >> + MatroskaMuxContext *mkv = s->priv_data;
> > >> + AVLFG c;
> > >> int i;
> > >>
> > >> if (s->nb_streams > MAX_TRACKS) {
> > >> @@ -2697,7 +2730,23 @@ static int mkv_init(struct AVFormatContext *s)
> > >> s->internal->avoid_negative_ts_use_pts = 1;
> > >> }
> > >>
> > >> + mkv->tracks = av_mallocz_array(s->nb_streams,
> sizeof(*mkv->tracks));
> > >> + if (!mkv->tracks) {
> > >> + return AVERROR(ENOMEM);
> > >> + }
> > >> +
> > >
> > >> + if (!(s->flags & AVFMT_FLAG_BITEXACT))
> > >> + av_lfg_init(&c, av_get_random_seed());
> > >
> > > would there be a disadvantage if the configuration of a stream / its
> content
> > > is used a seed ?
> > > That way it would be more deterministic
> > >
> > I see several disadvantages:
> > 1. If one duplicates a stream (i.e. muxes it twice in the same file),
> > they would get the same uid (unless one takes additional measures).
>
> If you took the configuration of all streams as a single seed that would
> not
> happen
>
>
> > 2. If the packet content is used, one can determine the uid only after
> > having received a packet. This would mean that one has to postpone
> > writing the Tracks element until one has a packet and the same goes
> > for the tags given that the uid is used to convey what they apply to.
> > One could solve this problem in the seekable, non-livestreaming case
> > by reserving size to be overwritten later, but this is nontrivial and
> > I'd like to avoid that.
> > 3. If the configuration record is used, then you won't get anything
> > remotely unique: Using the same encoder settings will likely produce
> > files with identical uids.
> >
> > Furthermore, this approach means that I can reuse the code for both
> > attachments as well as tracks. (See the next patch in this series.)
>
> fair enough, if its considered that avoiding some complexity is more
> important than deterministic output
>
Just to be sure: The output will be deterministic if the bitexact flag is
set.
I don't see a reason for deterministic output if it is not set.
>
> Either way if you need X bits of entropy in the muxer there should be
> X/32 calls to av_get_random_seed(). as it is there are 4 independant
> places calling av_get_random_seed() after the patch each using 32bit
> to initialize an independant LFG.
>
After the next patch, there will only be three; and usually only two of them
are used (the one in mkv_get_uid is usually not used).
> seed collisions are not at all impossible at 32bits
> also av_get_random_seed() can be expensive on some platforms so
> i think there should be a single place calling it as needed and not
> throwing the obtained entropy away.
>
I didn't deem this important performance-wise, as this is only done a few
times during init.
But if you want to, I can reuse the AVLFG from mkv_init for the segment UID.
>
> Also, why 32bit ? arent these "uint" UIDs 64bit ?
>
> Most UIDs (except the Segment UIDs) are 64bit. The reason for 32bit is
that this patchset was already pretty long, so I didn't include a patch to
make these 64bit. (There are some issues with the chapter UIDs: The
Matroska demuxer implicitly only uses the lower 32bits without making sure
that these numbers are distinct; and in an edge case the written chapter ID
can be zero despite this being against the spec: Namely if one of the
chapter IDs is INT_MIN and another one is INT_MAX.)
> and, this
> + /* Test the numbers from 1 to i. */
> + for (int j = 1, k; j < i + 1; j++) {
> + for (k = 0; k < i; k++) {
> + if (tracks[k].uid == j)
> + break;
> + }
> + if (k == i)
> + return j;
> + }
>
> just smells "wrong"
>
> This code would only be executed if using AVLFG would repeatedly lead to
collisions;
it is just there to guarantee that the UIDs are different, but actually it
is not intended to be run.
> if you want a simple non repeating UID, just count 0,1,2,3 for each
> and put that in the unused upper 32bit for example
>
> or you can even ditch LFG and use something that findamentgally generates
> non repeating output.
> UID[i] = seed + i
> comes to mind
> or if theres a reason why that is not good then you can also do something
> like
> UID[i] = (seed + i*(seed2|1)) & 0xFFFFFFFFFFFFFFFF;
>
> but maybe theres a reason behind the more complex UID generation that
> iam missing ?
>
> The reason I've chosen this is that this is what has already been used
for the UIDs of attachments, so I simply wanted to reuse this, while also
fixing its defects (namely that it does not ensure that the resulting UIDs
are actually distinct).
- Andreas
More information about the ffmpeg-devel
mailing list