[FFmpeg-devel] [PATCH v2] avformat/mov: add option max_stts_delta

Thu Dec 16 00:23:40 EET 2021

On Thu, Nov 25, 2021 at 06:03:18PM +0100, Michael Niedermayer wrote:
> On Thu, Nov 25, 2021 at 10:51:15AM +0530, Gyan Doshi wrote:
> > 
> > 
> > On 2021-11-25 12:56 am, Michael Niedermayer wrote:
> > > On Wed, Nov 24, 2021 at 10:58:00AM +0530, Gyan Doshi wrote:
> > > > 
> > > > On 2021-11-24 01:16 am, Michael Niedermayer wrote:
> > > > > On Tue, Nov 23, 2021 at 06:41:06PM +0530, Gyan Doshi wrote:
> > > > > > Very high stts sample deltas may occasionally be intended but usually
> > > > > > they are written in error or used to store a negative value for dts correction
> > > > > > when treated as signed 32-bit integers.
> > > > > > 
> > > > > > This option lets the user set an upper limit, beyond which the delta
> > > > > > is clamped to 1. Negative values of under 1 second are used to adjust
> > > > > > dts.
> > > > > > 
> > > > > > Unit is the track time scale. Default is INT_MAX which maintains current handling.
> > > > > > ---
> > > > > >    doc/demuxers.texi  |  6 ++++++
> > > > > >    libavformat/isom.h |  1 +
> > > > > >    libavformat/mov.c  | 14 +++++++++-----
> > > > > >    3 files changed, 16 insertions(+), 5 deletions(-)
> > > > > > 
> > > > > > diff --git a/doc/demuxers.texi b/doc/demuxers.texi
> > > > > > index cab8a7072c..15078b9b1b 100644
> > > > > > --- a/doc/demuxers.texi
> > > > > > +++ b/doc/demuxers.texi
> > > > > > @@ -713,6 +713,12 @@ specify.
> > > > > >    @item decryption_key
> > > > > >    16-byte key, in hex, to decrypt files encrypted using ISO Common Encryption (CENC/AES-128 CTR; ISO/IEC 23001-7).
> > > > > > +
> > > > > > + at item max_stts_delta
> > > > > > +The sample offsets stored in a track's stts box are 32-bit unsigned integers. However, very large values usually indicate
> > > > > > +a value written by error or a storage of a small negative value as a way to correct accumulated DTS delay.
> > > > > > +Range is 0 to UINT_MAX. Default is INT_MAX.
> > > > > > +
> > > > > >    @end table
> > > > > >    @subsection Audible AAX
> > > > > > diff --git a/libavformat/isom.h b/libavformat/isom.h
> > > > > > index ef8f19b18c..625dea8421 100644
> > > > > > --- a/libavformat/isom.h
> > > > > > +++ b/libavformat/isom.h
> > > > > > @@ -305,6 +305,7 @@ typedef struct MOVContext {
> > > > > >        int32_t movie_display_matrix[3][3]; ///< display matrix from mvhd
> > > > > >        int have_read_mfra_size;
> > > > > >        uint32_t mfra_size;
> > > > > > +    uint32_t max_stts_delta;
> > > > > >    } MOVContext;
> > > > > >    int ff_mp4_read_descr_len(AVIOContext *pb);
> > > > > > diff --git a/libavformat/mov.c b/libavformat/mov.c
> > > > > > index 451cb78bbf..bbda07ac42 100644
> > > > > > --- a/libavformat/mov.c
> > > > > > +++ b/libavformat/mov.c
> > > > > > @@ -3965,14 +3965,17 @@ static void mov_build_index(MOVContext *mov, AVStream *st)
> > > > > >                    current_offset += sample_size;
> > > > > >                    stream_size += sample_size;
> > > > > > -                /* A negative sample duration is invalid based on the spec,
> > > > > > -                 * but some samples need it to correct the DTS. */
> > > > > > -                if (sc->stts_data[stts_index].duration < 0) {
> > > > > > +                /* STTS sample offsets are uint32 but some files store it as int32
> > > > > > +                 * with negative values used to correct DTS delays.
> > > > > > +                   There may be abnormally large values as well. */
> > > > > > +                if (sc->stts_data[stts_index].duration > mov->max_stts_delta) {
> > > > > > +                    // assume high delta is a negative correction if less than 1 second
> > > > > > +                    int32_t delta_magnitude = *((int32_t *)&sc->stts_data[stts_index].duration);
> > > > > >                        av_log(mov->fc, AV_LOG_WARNING,
> > > > > > -                           "Invalid SampleDelta %d in STTS, at %d st:%d\n",
> > > > > > +                           "Correcting too large SampleDelta %u in STTS, at %d st:%d.\n",
> > > > > >                               sc->stts_data[stts_index].duration, stts_index,
> > > > > >                               st->index);
> > > > > > -                    dts_correction += sc->stts_data[stts_index].duration - 1;
> > > > > > +                    dts_correction += (delta_magnitude < 0 && FFABS(delta_magnitude) < sc->time_scale ? delta_magnitude - 1 : 0);
> > > > > >                        sc->stts_data[stts_index].duration = 1;
> > > > > >                    }
> > > > > >                    current_dts += sc->stts_data[stts_index].duration;
> > > > > > @@ -8566,6 +8569,7 @@ static const AVOption mov_options[] = {
> > > > > >        { "decryption_key", "The media decryption key (hex)", OFFSET(decryption_key), AV_OPT_TYPE_BINARY, .flags = AV_OPT_FLAG_DECODING_PARAM },
> > > > > >        { "enable_drefs", "Enable external track support.", OFFSET(enable_drefs), AV_OPT_TYPE_BOOL,
> > > > > >            {.i64 = 0}, 0, 1, FLAGS },
> > > > > > +    { "max_stts_delta", "treat offsets above this value as invalid", OFFSET(max_stts_delta), AV_OPT_TYPE_INT, {.i64 = INT_MAX}, 0, UINT_MAX, .flags = AV_OPT_FLAG_DECODING_PARAM },
> > > > > >        { NULL },
> > > > > >    };
> > > > > The stts is used in other places, the value parsed as unsigned will cause
> > > > > problems there too
> > > > I see stts duration used in 3 places in mov.c
> > > > 
> > > > mov_read_stts(), which my earlier patch changed.
> > > > 
> > > > mov_build_index(), which this patch changes,
> > > > 
> > > > mov_read_trak() where frame rate is populated for CFR streams and is called
> > > > very shortly after mov_build_index().
> > > > I don't think that can break for non-malformed files as the only duration
> > > > entry in the stream is not likely to be > INT_MAX
> > > > In any case, after this patch, with default option value, it will see the
> > > > same values as earlier.
> > > > 
> > > If iam not mistaken there is code that adds the values up to initilize some
> > > duration. this works with 64bit so the small negative values interpreted as
> > > unsigned will result in a wrong result i would expect
> > > that may very well not be the only issue
> > 
> > Yeah, I see the stream duration being set in mov_read_stts().
> > Interestingly,  after 153639cb9cf, the unadjusted deltas were being added,
> > which would have included all negative deltas, small or large.
> 
> did you check which variant leads to the correct duration ?
> i naively thought that as it was before was correct but i didnt check
> that so i can very well be wrong
> 
> 
> > 
> > 153639cb9cf  shifted the original negative delta adjustment code from
> > mov_read_stts to mov_build_index. Was that necessary?
> 
> thats andreas commit, 5 years ago, i cannot awnser this 
> 
> 
> > 
> > 
> > > > > the cast is also fragile as it will break when someone tries to change it
> > > > > to int64
> > > > In what circumstances would that happen?
> > > when someone tries to change the code.
> > > having both signed and unsigned 32bit in a 32bit element screams
> > 
> > Originally, negative values never left mov_read_stts. Maybe that's best.
> > 
> > Anyway, FATE passes, so it may only affect edge cases if it does. But I will
> > walk through the code to check.
> > 
> > 
> > > [...]
> > > > > also please select the default so that it works with all real world files
> > > > No single value can all work for all files. Hence the option.
> > > but what i wrote was about all real world files.
> > > Sure you can make a file that fails but so far the files i have seen and
> > > the ones you told me about all would work with 49% of all possibly defaults
> > > why do you pick a default that doesnt work with them and then argue about
> > > that ?
> > > Lets try to fix the code not win the argument please
> > 
> > I picked the default to keep the old behaviour. 99% of files I see have all
> > valid offsets.
> > Among the rest, large values are mostly errors. Very few are intended.
> > What default do you suggest?
> 
> I would suggest something that treats small negative ones as negative
> signed but leaves all the rest as unsigned. I dont really know what
> "small" should be here but what the file here used was around -10ms
> so if we take a common timebase and 10seconds that should hopefully
> catch all these negative cases and leave >99% of the space for unsigned

Whats the status of this ?
Do you have time and interrest to work on a fix ?
If not this should be reverted until someone has time
I think we should neither break one file to fix one nor should
a per file user action like tuning an option be needed for normal
non crafted files.

Thanks

[...]

-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Concerning the gods, I have no means of knowing whether they exist or not
or of what sort they may be, because of the obscurity of the subject, and
the brevity of human life -- Protagoras
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 195 bytes
Desc: not available
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20211215/cd981f00/attachment.sig>