[FFmpeg-devel] [PATCH v2] avformat/dhav: fix backward scanning for get_duration and optimize seeking

Andreas Rheinhardt andreas.rheinhardt at outlook.com
Tue May 20 22:18:22 EEST 2025


Derek Buitenhuis:
> From: Justin Ruggles <justinr at vimeo.com>
> 
> The backwards scanning done for incomplete final packets should not
> assume a specific alignment at the end of the file. Truncated files
> result in hundreds of thousands of seeks if the final packet does not
> fall on a specific byte boundary, which can be extremely slow.
> For example, with HTTP, each backwards seek results in a separate
> HTTP request.
> 
> This changes the scanning to check for the end tag 1 byte at a time
> and buffers the last 1 MiB using ffio_ensure_seekback to avoid additional
> seek operations.
> 
> Signed-off-by: Justin Ruggles <justinr at vimeo.com>
> Signed-off-by: Derek Buitenhuis <derek.buitenhuis at gmail.com>
> ---
> Addresses Andreas' comments from the previous round.
> ---
>  libavformat/dhav.c | 48 +++++++++++++++++++++++++++++++++++-----------
>  1 file changed, 37 insertions(+), 11 deletions(-)
> 
> diff --git a/libavformat/dhav.c b/libavformat/dhav.c
> index b2ead99609..27e2f4db23 100644
> --- a/libavformat/dhav.c
> +++ b/libavformat/dhav.c
> @@ -22,6 +22,7 @@
>  
>  #include <time.h>
>  
> +#include "libavutil/intreadwrite.h"
>  #include "libavutil/mem.h"
>  #include "libavutil/parseutils.h"
>  #include "avio_internal.h"
> @@ -232,34 +233,59 @@ static void get_timeinfo(unsigned date, struct tm *timeinfo)
>      timeinfo->tm_sec  = sec;
>  }
>  
> +#define MAX_DURATION_BUFFER_SIZE (1024*1024)
> +
>  static int64_t get_duration(AVFormatContext *s)
>  {
>      DHAVContext *dhav = s->priv_data;
>      int64_t start_pos = avio_tell(s->pb);
> +    int64_t end_pos = -1;
>      int64_t start = 0, end = 0;
>      struct tm timeinfo;
> -    int max_interations = 100000;
> +    uint8_t *end_buffer;
> +    int64_t end_buffer_size;
> +    int64_t end_buffer_pos;
> +    int64_t offset;
> +    int ret;
>  
>      if (!s->pb->seekable)
>          return 0;
>  
> -    avio_seek(s->pb, avio_size(s->pb) - 8, SEEK_SET);
> -    while (avio_tell(s->pb) > 12 && max_interations--) {
> -        if (avio_rl32(s->pb) == MKTAG('d','h','a','v')) {
> -            int64_t seek_back = avio_rl32(s->pb);
> +    end_buffer_size = FFMIN(MAX_DURATION_BUFFER_SIZE, avio_size(s->pb));
> +    end_buffer = av_malloc(end_buffer_size);
> +    if (!end_buffer)
> +        return 0;
> +    end_buffer_pos = avio_size(s->pb) - end_buffer_size;
> +    avio_seek(s->pb, end_buffer_pos, SEEK_SET);
> +    ret = ffio_ensure_seekback(s->pb, end_buffer_size);
> +    if (ret < 0)
> +        return 0;
> +    avio_read(s->pb, end_buffer, end_buffer_size);
>  
> -            avio_seek(s->pb, -seek_back, SEEK_CUR);
> -            read_chunk(s);
> -            get_timeinfo(dhav->date, &timeinfo);
> -            end = av_timegm(&timeinfo) * 1000LL;
> +    offset = end_buffer_size - 8;
> +    while (offset > 0) {
> +        if (AV_RL32(end_buffer + offset) == MKTAG('d','h','a','v')) {
> +            int64_t seek_back = AV_RL32(end_buffer + offset + 4);
> +            end_pos = end_buffer_pos + offset - seek_back + 8;
>              break;
>          } else {
> -            avio_seek(s->pb, -12, SEEK_CUR);
> +            offset -= 9;
>          }
>      }
>  
> -    avio_seek(s->pb, start_pos, SEEK_SET);
> +    av_freep(&end_buffer);
> +
> +    if (end_pos < 0) {
> +        avio_seek(s->pb, start_pos, SEEK_SET);
> +        return 0;
> +    }
> +
> +    avio_seek(s->pb, end_pos, SEEK_SET);
> +    read_chunk(s);
> +    get_timeinfo(dhav->date, &timeinfo);
> +    end = av_timegm(&timeinfo) * 1000LL;

The only thing that you want from parsing the end is the end timestamp
and this is given by data alone. date is a 32bit le number at offset 16
from the start of the dhav tag; it can be read directly from the buffer,
so you do not need to seek again and use read_chunk to get the end.
It also means that your ffio_ensure_seekback() is unnecessary (unless
you wanted to ensure that the seek back to start_pos works (which is
currently not guaranteed at all and not ensured by your code and would
probably also be bad in general given that this would cache the whole
file in memory).
(The above is based on the presumption that we are not really interested
in what parse_ext() may parse in the chunk at the end; I don't know
whether this is true or not.)

- Andreas



More information about the ffmpeg-devel mailing list