[FFmpeg-devel] [libavutil/timestamp.h: Fix loss of precision in timestamps for silencedetect on long files]
Allan Cady
allancady at yahoo.com
Tue Mar 12 01:46:00 EET 2024
> On Monday, March 11, 2024 at 12:50:11 PM PDT, <epirat07 at gmail.com> wrote:
> On 11 Mar 2024, at 15:26, Andreas Rheinhardt wrote:
>> Andreas Rheinhardt:
>>> Allan Cady via ffmpeg-devel:
>>>> From: "Allan Cady" <allancady at yahoo.com>
>>>>
>>>> I propose changing the format to "%.6f", which will
>>>> give microsecond precision for all timestamps, regardless of
>>>> offset. Trailing zeros can be trimmed from the fraction, without
>>>> losing precision. If the length of the fixed-precision formatted
>>>> timestamp exceeds the length of the allocated buffer
>>>> (AV_TS_MAX_STRING_SIZE, currently 32, less one for the
>>>> terminating null), then we can fall back to scientific notation, though
>>>> this seems almost certain to never occur, because 32 characters would
>>>> allow a maximum timestamp value of (32 - 1 - 6 - 1) = 24 characters.
>>>> By my calculation, 10^24 seconds is about six orders of magnitude
>>>> greater than the age of the universe.
>>>>
>>>> The fix proposed here follows the following logic:
>>>>
>>>> 1) Try formatting the number of seconds using "%.6f". This will
>>>> always result in a string with six decimal digits in the fraction,
>>>> possibly including trailing zeros. (e.g. "897234.73200").
>>>>
>>>> 2) Check if that string would overflow the buffer. If it would, then
>>>> format it using scientific notation ("%.8g").
>>>>
>>>> 3) If the original fixed-point format fits, then trim any trailing
>>>> zeros and decimal point, and return that result.
>>>>
>>>> Making this change broke two fate tests, filter-metadata-scdet,
>>>> and filter-metadata-silencedetect. To correct this, I've modified
>>>> tests/ref/fate/filter-metadata-scdet and
>>>> tests/ref/fate/filter-metadata-silencedetect to match the
>>>> new output.
>>>>
>>>> Signed-off-by: Allan Cady <allancady at yahoo.com>
>>>> ---
>>>> libavutil/timestamp.h | 53 +++++++++++++++++++-
>>>> tests/ref/fate/filter-metadata-scdet | 12 ++---
>>>> tests/ref/fate/filter-metadata-silencedetect | 2 +-
>>>> 3 files changed, 58 insertions(+), 9 deletions(-)
>>>>
>>>> diff --git a/libavutil/timestamp.h b/libavutil/timestamp.h
>>>> index 2b37781eba..2f04f9bb2b 100644
>>>> --- a/libavutil/timestamp.h
>>>> +++ b/libavutil/timestamp.h
>>>> @@ -25,6 +25,7 @@
>>>> #define AVUTIL_TIMESTAMP_H
>>>>
>>>> #include "avutil.h"
>>>> +#include <stdbool.h>
>>>>
>>>> #if defined(__cplusplus) && !defined(__STDC_FORMAT_MACROS) && !defined(PRId64)
>>>> #error missing -D__STDC_FORMAT_MACROS / #define __STDC_FORMAT_MACROS
>>>> @@ -53,6 +54,32 @@ static inline char *av_ts_make_string(char *buf, int64_t ts)
>>>> */
>>>> #define av_ts2str(ts) av_ts_make_string((char[AV_TS_MAX_STRING_SIZE]){0}, ts)
>>>>
>>>> +/**
>>>> + * Strip trailing zeros and decimal point from a string. Performed
>>>> + * in-place on input buffer. For local use only by av_ts_make_time_string.
>>>> + *
>>>> + * e.g.:
>>>> + * "752.378000" -> "752.378"
>>>> + * "4.0" -> "4"
>>>> + * "97300" -> "97300"
>>>> + */
>>>> +static inline void av_ts_strip_trailing_zeros_and_decimal_point(char *str) {
>>>> + if (strchr(str, '.'))
>>>> + {
>>>> + int i;
>>>> + for (i = strlen(str) - 1; i >= 0 && str[i] == '0'; i--) {
>>>> + str[i] = '\0';
>>>> + }
>>>> +
>>>> + // Remove decimal point if it's the last character
>>>> + if (i >= 0 && str[i] == '.') {
>>>> + str[i] = '\0';
>>>> + }
>>>> +
>>>> + // String was modified in place; no need for return value.
>>>> + }
>>>> +}
>>>> +
>>>> /**
>>>> * Fill the provided buffer with a string containing a timestamp time
>>>> * representation.
>>>> @@ -65,8 +92,30 @@ static inline char *av_ts_make_string(char *buf, int64_t ts)
>>>> static inline char *av_ts_make_time_string(char *buf, int64_t ts,
>>>> const AVRational *tb)
>>>> {
>>>> - if (ts == AV_NOPTS_VALUE) snprintf(buf, AV_TS_MAX_STRING_SIZE, "NOPTS");
>>>> - else snprintf(buf, AV_TS_MAX_STRING_SIZE, "%.6g", av_q2d(*tb) * ts);
>>>> + if (ts == AV_NOPTS_VALUE)
>>>> + {
>>>> + snprintf(buf, AV_TS_MAX_STRING_SIZE, "NOPTS");
>>>> + }
>>>> + else
>>>> + {
>>>> + const int max_fraction_digits = 6;
>>>> +
>>>> + // Convert 64-bit timestamp to double, using rational timebase
>>>> + double seconds = av_q2d(*tb) * ts;
>>>> +
>>>> + int length = snprintf(NULL, 0, "%.*f", max_fraction_digits, seconds);
>>>> + if (length <= AV_TS_MAX_STRING_SIZE - 1)
>>>> + {
>>>> + snprintf(buf, AV_TS_MAX_STRING_SIZE, "%.*f", max_fraction_digits, seconds);
>>>> + av_ts_strip_trailing_zeros_and_decimal_point(buf);
>>>> + }
>>>> + else
>>>> + {
>>>> + snprintf(buf, AV_TS_MAX_STRING_SIZE, "%.8g", seconds);
>>>> + }
>>>> +
>>>> + }
>>>> +
>>>> return buf;
>>>> }
>>>>
>>>
>>> 1. What makes you believe that all users want the new format and that it
>>> does not cause undesired behaviour for some (maybe a lot) of them?
I definitely do not know what other users would want. I would think
maybe some would like the change, others wouldn't, and most would
never know.
>>> The
>>> number of characters written by the earlier code stayed pretty constant
>>> even when the times became big (in this case, it just printed 8 chars if
>>> ts>=0), yet your code will really make use of the whole buffer.
It's true that my change will increase the potential length of
the output beyond 8 significant digits.
The issue I was having that brought this up was, I have some very long
audio files (up to 50 hours long), which I was wanting to split
into smaller pieces. I wrote some scripts that use silencedetect to get
the locations of breaks and then split the files at the breaks, but I
discovered that for segments near the end of the file, silencedetect was
returning whole-number timestamps, which was causing undesirable
results for me. Thinking functionally, it seems like timestamps further
out in a file ought to have the same precision as those near the
beginning. So this seems to me like a minor oversight in the original
design, that might warrant fixing.
>>> Granted, we could tell our users that they have no right to complain
>>> about this, given that we always had a "right" to use the full buffer,
>>> but I consider this a violation of the principle of least surprise.
I definitely agree with you there.
>>> Why don't you just change silencedetect or add another function?
I actually started out taking that approach in my submission a few weeks
ago. Marton Balint suggested (in a message on 20 Feb) that we make the
change in av_ts_make_time_string, so I did that for this submission.
I'm open to whatever approach you all consider is best.
>>> 2. For very small timestamps (< 10^-4), the new code will print a lot of
>>> useless leading zeros (after the decimal point). In fact, it can be so
>>> many that the new code has less precision than the old code, despite
>>> using the fill buffer.
I don't understand. Leading zeros after the decimal point are far from
useless -- they are part of the value. Maybe what you're saying is that
six digits is more precision than necessary? That may be so. I could
personally do fine with just two digits (hundredths), as long as it's
consistent through the length of the file.
>>> 2. This is way too much code for an inline function.
No disagreement there.
>>> 3. Anyway, your placement of {} on their own lines does not match the
>>> project coding style.
I'm happy to conform with project coding style.
>> In addition to this, there is another issue here: Your
>> av_ts_strip_trailing_zeros_and_decimal_point() presumes that the
>> "decimal-point character" is always '.', but this can be changed via
>> setlocale().
Excellent point, which I hadn't considered. I have no experience with
how locale is handled in C. I would welcome advice on the best way to
handle this.
> True, though I would consider this a more general bug. We should be
> consistent and not generate files that are locale-dependent and then
> not parseable anymore with a different one… That’s just a huge mess.
>
> Also in general FFmpeg is completely broken if you use any locale that
> does not use . as decimal separator. (This never shows for most users
> currently as most people use FFmpeg CLI which does not respect the
> users locale)
I'll leave that conversation to the experts here.
Thanks for giving my code a look.
More information about the ffmpeg-devel
mailing list