[FFmpeg-devel] [PATCH] First shot at AVCHD seeking via new seeking API

Sun Jul 26 23:07:22 CEST 2009

On 07/26/2009 06:00 AM, Ivan Schreter wrote:
> Hi Baptiste,
>
> first of all, thanks for the review.
>
> I am also trying to put this searching routine to work for MPEG-PS, so
> I'll have to change the code a bit (hopefully only two things). I'll
> explain in the comments below.
>
> Baptiste Coudurier wrote:
>> Hi Ivan,
>>
>> On 07/25/2009 04:14 AM, Ivan Schreter wrote:
>>> Hi,
>>>
>>> [...]
>>> If noone objects, I'll commit them during the next week.
>>>
>>
>> Since it is only used in mpegts for now, please put all the code in
>> mpegts.c, these symbols do not need to be exported.
> Yes and no. As I wrote above, I'm trying to put this to work for MPEG-PS
> as well. I've seen that MPEG-PS uses index API, but it seems not to work
> correctly for seeking anyway. The same method used for seeking in
> MPEG-TS can be also used in MPEG-PS (without an index). Other formats
> will probably profit from it as well.

Maybe other formats will profit, yes, we will see in the future.
In the mean time please focus on TS and PS.

> So either it lands in utils.c (which is IMHO the proper place) or I'll
> create a new seek_utils.c, which will contain the routine (and
> optionally move other seek-relevant code from utils.c into the new
> file). Or eventually in some common MPEG source file (which one?). What
> do you suggest?

Then seek.c might be a better place.

>>
>> Also, I think state must not be trashed if seek failed, that is
>> av_read_frame_flush must not be called before seeking.
> This is hard to achieve. In order to find keyframes, I have to start
> reading the frames via the parser. This trashes the state. Only then I
> know whether the seek succeeds or not. Maybe it would be possible to
> save the state and restore it on error. I'll look into it.

I'd prefer to fix it now.

>> > [...]
>>>
>>> +/**
>>> + * Helper structure describing keyframe search state of one stream.
>>> + */
>>> +typedef struct {
>>> + int active; ///< Set to 1 for active streams, 0 for inactive.
>>
>> I don't think inactive stream need a structure, so this field is
>> unneeded.
> Either I need an "active" flag, or I need to know index of the stream to
> which this struct refers. But when searching for associated structure
> from read packet, it's much easier to use stream_index of the packet to
> find the appropriate structure in O(1) than looking for it via O(n) loop.
>
> Or did you mean something different?

You can check for st->discard.
You can also save the memory of AVSyncPoint struct for discarded streams 
and keep O(1) lookup.

And please don't use MAX_STREAMS, in the future we need to support 
arbitrary number of streams.

>>> + int64_t epsilon; ///< Rounding error based on time base.
>>
>> Can you explain why it is needed ?
> Timestamps are scaled back and forth between stream time base (of one or
> more streams with possibly different time bases) and AV_TIME_BASE, also
> depending on how the API is called. This is to account for rounding
> errors between time base of the stream and AV_TIME_BASE.

I think it is unneeded.

>>
>> > [...]
>> >
>>> +
>>> +/**
>>> + * Partial search for keyframe in multiple streams.
>>> + *
>>> + * This routine searches for the next lower and next higher
>>> timestamp to
>>> + * given target timestamp in each stream, starting at current file
>>> position
>>> + * and ending at position, where all streams have already been examined
>>> + * (or when all higher key frames found in first iteration).
>>> + *
>>> + * This routine is called iteratively with exponential backoff to
>>> find lower
>>> + * timestamp.
>>> + *
>>> + * @param s format context.
>>> + * @param timestamp target timestamp (or position, if
>>> AVSEEK_FLAG_BYTE).
>>> + * @param flags seeking flags.
>>> + * @param sync array with information per stream.
>>> + * @param keyframes_to_find count of keyframes to find in total.
>>> + * @param found_lo pointer to count of already found low timestamp
>>> keyframes.
>>> + * @param found_hi pointer to count of already found high timestamp
>>> keyframes.
>>> + */
>>> +static void search_keyframe_multi_partial(AVFormatContext *s,
>>> + int64_t timestamp,
>>> + int flags,
>>> + AVSyncPoint *sync,
>>> + int keyframes_to_find,
>>> + int *found_lo,
>>> + int *found_hi)
>>
>> I don't think the name is adequate. I'm thinking of a better one.
> Yeah, the name is not particularly good, so if you have a better idea,
> it's welcome.

At least remove partial and multi. Something like search_hi_lo_keyframes.

>>
>> [...]
>>> + // Use file position as fallback, if no position returned. This is
>>> + // inexact in most formats, so format parser should always fill
>>> + // proper position to make seeking routine return reliable results.
>>> + pos = fpos;
>>> + }
>>
>> Actually I think fpos is more reliable in our situation :)
> Actually, it is not. As you know, in MPEG, a frame is distributed over
> several packets interleaved with packets of frames of other streams. It
> is returned from the parser, after the last packet of the frame is read.
> Thus, fpos is unreliable, since it's the position of the last packet of
> the frame, but we need the first one. It's just a fallback, which may or
> may not work for a particular format. It wouldn't for MPEG.

Humm, you can also have multiple frames per PES packet and these frames 
have the same pos since seeking requires a TS packet boundary and or 
PS/PES packet boundary.

>>
>>> + if (sp->first_pos == AV_NOPTS_VALUE) {
>>> + // Note down termination position for the next iteration - when
>>> + // we encounter a packet with the same position, we will ignore
>>> + // any further packets for this stream in next iteration (as they
>>> + // are already evaluated).
>>> + sp->first_pos = pos;
>>> + }
>>> +
>>> + if (sp->term_pos != AV_NOPTS_VALUE&& pos> sp->term_pos) {
>>> + // We are past the end position from last iteration, ignore packet.
>>> + if (!sp->terminated) {
>>> + sp->terminated = 1;
>>> + ++terminated_count;
>>> + if (terminated_count == keyframes_to_find)
>>> + break; // all terminated, iteration done
>>> + }
>>> + continue;
>>> + }
>>
>> Do term_pos and first_pos need to be per stream ?
> Yes. Reason as above - frames for various streams interleaved in
> packets, so the starting position of frames in various streams don't
> necessarily correlate with their arrival via av_read_frame().

Scenario is currently for first iteration:
seek at pos
read frames until you find hi keyframes for all streams.
The Lowest pos is the first keyframe found. It may not be in order but 
that doesn't matter at the first iteration.

On next iterations you have no interest in going past the first pos 
since read_frame will and if not must return the same frames in the same 
order at the same pos, whatever interleaving there is. If it doesn't 
problem is somewhere else and must be fixed.

So you only need one first_pos for all the streams.
And next iterations will find keyframes before pos. That's why I don't 
think you need term_pos nor terminated either.

>>> + // Evaluate key frames with known TS (or any frames, if
>>> AVSEEK_FLAG_ANY set).
>>> + if (pts != AV_NOPTS_VALUE&& ((flg& PKT_FLAG_KEY) || (flags&
>>> AVSEEK_FLAG_ANY))) {
>>> + // Rescale stream timestamp to AV_TIME_BASE.
>>> + st = s->streams[idx];
>>> + pts_tb = av_rescale(pts, AV_TIME_BASE * (int64_t)st->time_base.num,
>>> st->time_base.den);
>>> + dts_tb = av_rescale(dts, AV_TIME_BASE * (int64_t)st->time_base.num,
>>> st->time_base.den);
>>> +
>>> + if (flags& AVSEEK_FLAG_BYTE) {
>>> + // Seek by byte position.
>>> + if (pos<= timestamp) {
>>> + // Keyframe found before target position.
>>> + if (!sp->found_lo) {
>>> + // Found first keyframe lower than target position.
>>> + sp->found_lo = 1;
>>> + (*found_lo)++;
>>> + sp->pts_lo = pts_tb;
>>> + sp->dts_lo = dts_tb;
>>> + sp->pos_lo = pos;
>>> + } else if (sp->pos_lo< pos) {
>>> + // Found a better match (closer to target position).
>>> + sp->pts_lo = pts_tb;
>>> + sp->dts_lo = dts_tb;
>>> + sp->pos_lo = pos;
>>> + }
>>> + }
>>> + if (pos>= timestamp) {
>>> + // Keyframe found after target position.
>>> + if (!sp->found_hi) {
>>> + // Found first keyframe higher than target position.
>>> + sp->found_hi = 1;
>>> + (*found_hi)++;
>>> + sp->pts_hi = pts_tb;
>>> + sp->dts_hi = dts_tb;
>>> + sp->pos_hi = pos;
>>> + if (*found_hi>= keyframes_to_find&& sp->term_pos == INT64_MAX) {
>>> + // We found high frame for all. They may get updated
>>> + // to TS closer to target TS in later iterations (which
>>> + // will stop at start position of previous iteration).
>>> + break;
>>> + }
>>> + } else if (sp->pos_hi> pos) {
>>> + // Found a better match (actually, shouldn't happen).
>>> + sp->pts_hi = pts_tb;
>>> + sp->dts_hi = dts_tb;
>>> + sp->pos_hi = pos;
>>> + }
>>> + }
>>> + } else {
>>> + // Seek by timestamp.
>>> + if (pts_tb<= timestamp + sp->epsilon) {
>>> + // Keyframe found before target timestamp.
>>> + if (!sp->found_lo) {
>>> + // Found first keyframe lower than target timestamp.
>>> + sp->found_lo = 1;
>>> + (*found_lo)++;
>>> + sp->pts_lo = pts_tb;
>>> + sp->dts_lo = dts_tb;
>>> + sp->pos_lo = pos;
>>> + } else if (sp->pts_lo< pts_tb) {
>>> + // Found a better match (closer to target timestamp).
>>> + sp->pts_lo = pts_tb;
>>> + sp->dts_lo = dts_tb;
>>> + sp->pos_lo = pos;
>>> + }
>>> + }
>>> + if (pts_tb + sp->epsilon>= timestamp) {
>>> + // Keyframe found after target timestamp.
>>> + if (!sp->found_hi) {
>>> + // Found first keyframe higher than target timestamp.
>>> + sp->found_hi = 1;
>>> + (*found_hi)++;
>>> + sp->pts_hi = pts_tb;
>>> + sp->dts_hi = dts_tb;
>>> + sp->pos_hi = pos;
>>> + if (*found_hi>= keyframes_to_find&& sp->term_pos == INT64_MAX) {
>>> + // We found high frame for all. They may get updated
>>> + // to TS closer to target TS in later iterations (which
>>> + // will stop at start position of previous iteration).
>>> + break;
>>> + }
>>
>> If I understood correctly, at first iteration you should find high
>> keyframes for all streams.
> Not necessarily. Normally, yes, but there is a special case, with key
> frame in a stream beginning before rough position and ending after it.
> The frame won't be returned in the first iteration, only possibly a
> later key frame. It will only be found in later iteration (or not found
> at all, if seeking against end of the file).

You are describing the case of the first keyframe here right ?
It will be returned at the next iteration so it's fine and lo will be 
updated.

>>
>> If SEEK_BYTE is requested and the pos you are at is > max_ts (max
>> position) or term_pos was earlier at the previous iteration, then you
>> can break earlier after read_frame anyway.
>>
>> Do I miss something ?
> Again, interleaved frames in MPEG. This makes the stuff somewhat complex.

all streams pos > max_ts then.

>>
>>> + } else if (sp->pts_hi> pts_tb) {
>>> + // Found a better match (actually, shouldn't happen).
>>> + sp->pts_hi = pts_tb;
>>> + sp->dts_hi = dts_tb;
>>> + sp->pos_hi = pos;
>>> + }
>>
>> Duplicate code between SEEK_BYTE and timestamps. I believe you only
>> need one hi and one lo per stream (will be set to either pos or
>> timestamp), you don't need both pts and dts. Please explain why if
>> that's not possible.
> In case of SEEK_BYTE, I only need pos. In normal case, I need pos and
> pts. As for dts, I'm going to solve this differently (see later), since
> current DTS update code is inexact, which makes problems with MPEG-PS
> streams.

Why do you need pos _and_ pts per stream ? What is pos for ?

>>
>>> + }
>>> + }
>>> + }
>>> +
>>> + // Clean up the parser.
>>> + av_read_frame_flush(s);
>>> +}
>>> +
>>> +int64_t ff_gen_syncpoint_search(AVFormatContext *s,
>>> + int stream_index,
>>> + int64_t pos,
>>> + int64_t ts_min,
>>> + int64_t ts,
>>> + int64_t ts_max,
>>> + int flags)
>>> +{
>>> + AVSyncPoint sync[MAX_STREAMS], *sp;
>>> + AVStream *st;
>>> + int i;
>>> + int keyframes_to_find = 0;
>>> + int64_t curpos;
>>> + int64_t step;
>>> + int found_lo = 0, found_hi = 0;
>>> + int64_t min_distance;
>>> + int64_t min_pos = 0;
>>> +
>>> + if (stream_index>= 0&& !(flags& AVSEEK_FLAG_BYTE)) {
>>> + // Rescale timestamps to AV_TIME_BASE, if timestamp of a reference
>>> stream given.
>>> + st = s->streams[stream_index];
>>> + if (ts != AV_NOPTS_VALUE)
>>> + ts = av_rescale(ts, AV_TIME_BASE * (int64_t)st->time_base.num,
>>> st->time_base.den);
>>> + if (ts_min != INT64_MIN)
>>> + ts_min = av_rescale(ts_min, AV_TIME_BASE *
>>> (int64_t)st->time_base.num, st->time_base.den);
>>> + if (ts_max != INT64_MAX)
>>> + ts_max = av_rescale(ts_max, AV_TIME_BASE *
>>> (int64_t)st->time_base.num, st->time_base.den);
>>> + }
>>
>> I don't think rescaling is needed for now since in mpegts all streams
>> have same timebase.
> It is, since stream_index can be -1, which means, it's in AV_TIME_BASE.

And that is done in mpegts read_seek2, now you rescale back.
You specifically mentioned that this function must be called by demuxers.

Besides this may be done in avformat_seek_file.

> So I'd have to rescale in stream_index == -1 case also if the code were
> MPEG-specific. The rest of the code works in AV_TIME_BASE anyway and
> it's generic enough for other formats as well.

When you don't need to rescale, don't, for now you don't.

>>> + // Initialize syncpoint structures for each stream.
>>> + for (i = 0; i< s->nb_streams; ++i) {
>>> + sp =&sync[i];
>>> + st = s->streams[i];
>>> + if (s->streams[i]->discard< AVDISCARD_ALL) {
>>> + ++keyframes_to_find;
>>> + sp->pos_lo = INT64_MAX;
>>> + sp->pts_lo = INT64_MAX;
>>> + sp->dts_lo = INT64_MAX;
>>> + sp->found_lo = 0;
>>> + sp->pos_hi = INT64_MAX;
>>> + sp->pts_hi = INT64_MAX;
>>> + sp->dts_hi = INT64_MAX;
>>> + sp->found_hi = 0;
>>> + sp->epsilon = AV_TIME_BASE * (int64_t)st->time_base.num /
>>> st->time_base.den;
>>> + sp->term_pos = INT64_MAX;
>>> + sp->first_pos = AV_NOPTS_VALUE;
>>> + sp->terminated = 0;
>>> + sp->active = 1;
>>> + } else {
>>> + sp->active = 0;
>>> + }
>>> + }
>>> +
>>> + if (keyframes_to_find == 0) {
>>> + // No stream active, error.
>>> + return -1;
>>> + }
>>> +
>>> + // Find keyframes in all active streams with timestamp/position
>>> just before
>>> + // and just after requested timestamp/position.
>>> + step = 1024;
>>> + for (;;) {
>>> + curpos = pos - step;
>>> + if (curpos< 0)
>>> + curpos = 0;
>>> + url_fseek(s->pb, curpos, SEEK_SET);
>>> + search_keyframe_multi_partial(
>>> + s,
>>> + ts, flags,
>>> + sync, keyframes_to_find,
>>> +&found_lo,&found_hi);
>>> + if (found_lo == keyframes_to_find&& found_hi == keyframes_to_find)
>>> + break; // have all keyframes we wanted
>>
>> You can shortcut by checking if user specified a max_pos > ts or
>> min_pos < ts. In this case only found_lo or found_hi matters.
> I'm not sure.
>
> The max_pos/min_pos/max_ts/min_ts don't say "don't give me anything
> outside this range", rather "make sure my decoder output synchronizes in
> this range". I.e., it's perfectly OK to return a position before
> min_pos/min_ts.

I think, given the parameters name, it exactly says don't give me 
anything outside this range, that's why the API has been extended.
So it's not ok to return anything outside this range.

> Using only found_hi in forward seek is not OK, since there might be one
> stream with key frame past requested timestamp, but one or more other
> streams can have key frame before requested timestamp. So though the
> decoder synchronizes at timestamp > requested timestamp, we need to
> start reading before requested timestamp in order to synchronize closest
> to specified timestamp.

We need to start reading at the first keyframe frame pos.
You need to find the first key frame at which to seek which pts is < max_ts.
If first high keyframe for video stream is < max_ts, it's good.
If no video stream or video is disabled, audio stream must be checked.
If low video keyframe is nearer requested timestamp, or if max_ts is == 
ts, then it's low.

> Using only found_lo for backward seek might work, but why introduce a
> special case?

Speed of course.

> It's anyway necessary to scan each stream until high key
> frame, since our rough seeking position might be too early, so there are
> possibly several lo key frames after this position.

I don't think so.

>>> + if (curpos == 0)
>>> + break; // cannot go back anymore
>>> + step *= 2;
>>> + // switch termination positions
>>> + for (i = 0; i< s->nb_streams; ++i) {
>>> + sp =&sync[i];
>>> + if (sp->active) {
>>> + sp->term_pos = sp->first_pos;
>>> + sp->first_pos = AV_NOPTS_VALUE;
>>> + sp->terminated = 0;
>>> + }
>>
>> Are both first_pos and term_pos needed ?
> Yes. first_pos is the position of first frame found in current
> iteration. This will be copied to term_pos, so next iteration knows,
> this stream has been read up to this position and any frames coming
> afterwards are ignored. But other streams can still have some frames
> coming in. Only after all streams reach term_pos, we know that we really
> read all frames needed.

See my idea above regarding the usage for first_pos and term_pos.

> BTW, this will have to be adjusted, by keeping first_pos only if the
> frame has a known timestamp. Otherwise, we possibly miss a key frame. I
> think this is one of the problems with using the routine for MPEG-PS.
>
>>
>>> + }
>>> +
>>> + // Find actual position to start decoding so that decoder synchronizes
>>> + // closest to ts and update timestamps in streams.
>>> + pos = INT64_MAX;
>>> +
>>> + for (i = 0; i< s->nb_streams; ++i) {
>>> + sp =&sync[i];
>>> + if (sp->active) {
>>> + min_distance = INT64_MAX;
>>> + if (flags& AVSEEK_FLAG_BYTE) {
>>> + // Find closest byte position to requested position, within min/max
>>> limits.
>>> + if (sp->pos_lo != INT64_MAX&& ts_min<= sp->pos_lo&& sp->pos_lo<=
>>> ts_max) {
>>> + // low position is in range
>>> + min_distance = ts - sp->pos_lo;
>>> + min_pos = sp->pos_lo;
>>> + }
>>> + if (sp->pos_hi != INT64_MAX&& ts_min<= sp->pos_hi&& sp->pos_hi<=
>>> ts_max) {
>>> + // high position is in range, check distance
>>> + if (sp->pos_hi - ts< min_distance) {
>>> + min_distance = sp->pos_hi - ts;
>>> + min_pos = sp->pos_hi;
>>> + }
>>> + }
>>> + } else {
>>> + // Find timestamp closest to requested timestamp within min/max
>>> limits.
>>> + if (sp->pts_lo != INT64_MAX&& ts_min<= sp->pts_lo + sp->epsilon&&
>>> sp->pts_lo - sp->epsilon<= ts_max) {
>>> + // low timestamp is in range
>>> + min_distance = ts - sp->pts_lo;
>>> + min_pos = sp->pos_lo;
>>> + }
>>> + if (sp->pts_hi != INT64_MAX&& ts_min<= sp->pts_hi + sp->epsilon&&
>>> sp->pts_hi - sp->epsilon<= ts_max) {
>>> + // high timestamp is in range, check distance
>>> + if (sp->pts_hi - ts< min_distance) {
>>> + min_distance = sp->pts_hi - ts;
>>> + min_pos = sp->pos_hi;
>>> + }
>>> + }
>>> + }
>>
>> Duplicate code between SEEK_BYTE and timestamps.
> No. In SEEK_BYTE case, positions are compared, in normal case,
> timestamps. So it's not duplicate.

I think it is.
set hi and lo depending on SEEK_BYTE and do hi - low.

>>
>>> + if (min_distance == INT64_MAX) {
>>> + // no timestamp is in range, cannot seek
>>> + return -1;
>>> + }
>>> + if (min_pos< pos)
>>> + pos = min_pos;
>>> + }
>>> + }
>>> +
>>> + // We now have a position, so update timestamps of all streams
>>> appropriately.
>>> + for (i = 0; i< s->nb_streams; ++i) {
>>> + sp =&sync[i];
>>> + if (sp->active) {
>>> + if (sp->found_lo&& sp->pos_lo>= pos) {
>>> + av_update_cur_dts(s, s->streams[i], sp->dts_lo);
>>> + } else if (sp->found_hi&& sp->pos_hi>= pos) {
>>> + av_update_cur_dts(s, s->streams[i], sp->dts_hi);
>>> + } else {
>>> + // error, no suitable position found (actually impossible)
>>> + return -1;
>>> + }
>>> + }
>>> + }
>>
>> Do you really need update cur_dts ?
> Yes. And actually, it has to be updated to correct DTS, otherwise
> timestamp generation diverges and seeking is incorrect (in MPEG-PS).

I think that's timestamp generation problem.
cur_dts is needed for streams not having timestamps. PS and TS have 
timestamps, so cur_dts will be updated when a correct dts from PES 
packet is encoutered.

> I'm going to change DTS computation to have exact DTS. I'm thinking about
> simply putting all (stream_index, position, dts) triples encountered in
> an array and then picking out the closest higher DTS for seeking
> position. This array should be rather small, hopefully (up to 1.4 second
> worth of frames in MPEG, for instance, so maybe couple hundred entries),
> so I don't think any more advanced data structure is necessary.

Ouch, please avoid this if it's not needed or justify why it is. I don't 
think it is.

-- 
Baptiste COUDURIER                              GnuPG Key Id: 0x5C1ABAAA
Key fingerprint                 8D77134D20CC9220201FC5DB0AC9325C5C1ABAAA
FFmpeg maintainer                                  http://www.ffmpeg.org