[FFmpeg-devel] [PATCH] Add Apple HTTP Live Streaming protocol handler

Thu Aug 5 08:03:51 CEST 2010

On Wed, 4 Aug 2010, Ronald S. Bultje wrote:

> On Tue, Jul 27, 2010 at 5:45 AM, Martin Storsj? <martin at martin.st> wrote:
> [..]
> > +struct segment {
> > +    int duration;
> > +    char *url;
> > +};
> > +
> > +struct variant {
> > +    int bandwidth;
> > +    char *url;
> > +    ByteIOContext *pb;
> > +    AVFormatContext *ctx;
> > +    AVPacket pkt;
> > +    int stream_offset;
> > +
> > +    int start_seq_no;
> > +    int n_segments;
> > +    struct segment **segments;
> > +    int needed;
> > +};
> 
> Each of these (and below) could use at least a few doxy words to
> explain what they are. What is a segment? What is a variant?

Ok, I'll add a bit of doxy. FYI, a segment is one individual media file 
(normally an mpegts file). A variant is a playlist containing segments - 
you can have many parallel variants containing matching segments (in 
different bitrate versions).

> Is it possible to make url in both segment and vriant fixed-length, so
> that we don't have to alloc so may individual elements? Not critical,
> but would be nice.

Sure, if we fix some max length.

> > +typedef struct AppleHTTPContext {
> > +    char playlisturl[2048];
> 
> Unused (and kinda big).

Thanks for catching this, removed.

> > +    int64_t last_load_time;
> 
> You're assigning this with av_gettime(), which seems wrong if
> clockrate server/client are off. I think it's better if you use PTS
> here instead of clocktime.

Uhm, no, this is the correct use of this field. This is only used for live 
streams. In live streams, the playlists have a field indicating the 
segment duration - which also says how often we should reload the 
playlist, and for that, we need a realtime clock, such as av_gettime().

> > +static int read_chomp_line(ByteIOContext *s, char *buf, int maxlen)
> > +{
> > +    int len = ff_get_line(s, buf, maxlen);
> > +    if (len > 0 && (buf[len-1] == '\r' || buf[len-1] == '\n'))
> > +        buf[--len] = '\0';
> > +    if (len > 0 && (buf[len-1] == '\r' || buf[len-1] == '\n'))
> > +        buf[--len] = '\0';
> > +    return len;
> > +}
> 
> This could be simplified (if ...buf[len-1]=='\n' { if
> (buf[len-2]=='n') .. else .. }), but that's not too critical.
> 
> You can presumably also chop off all space-like characters, assuming
> that's valid.

True, I'll see if I can simplify it a bit.

> > +static void reset_packet(AVPacket *pkt)
> > +{
> > +    av_init_packet(pkt);
> > +    pkt->data = NULL;
> > +}
> 
> This seems overused. When is the data=NULL necessary, and what does it
> mean? What happens if we don't?

No, this is fully required, in this form. I've got a statically allocated 
AVPacket in each variant that is used as intermediate buffer. When 
receiving more than one stream at a time, we read one packet from each 
variant, but only return one, the one with the lowest dts. For this, I 
need to be able to distinguish which AVPackets already have a valid packet 
in them, and which ones are "empty" and needs to be refilled with data. 
For this, I check the ->data pointer - this function resets it back to 
NULL and resets the rest of the packet to be a freshly initialized packet. 
(av_init_packet doesn't touch the data pointer at all.)

> > +static struct variant *new_variant(AppleHTTPContext *c, int bandwidth,
> > +                                   const char *url)
> > +{
> > +    struct variant *var = av_mallocz(sizeof(struct variant));
> > +    if (!var)
> > +        return NULL;
> > +    reset_packet(&var->pkt);
> 
> > +    var->bandwidth = bandwidth;
> > +    var->url = av_strdup(url);
> 
> Vertical align (also in other places). I'm wondering if we can just
> av_strlcpy() into a fixed size buffer here (as said above).

Aligned this one and a few more.

> [..]
> > +static int parse_playlist(AppleHTTPContext *c, const char *url,
> > +                          struct variant *var)
> > +{
> > +    ByteIOContext *in;
> [..]
> > +    if ((ret = url_fopen(&in, url, URL_RDONLY)) < 0)
> > +        return ret;
> 
> This should already be open in s->pb. Why do you want to reopen it?

No, it isn't open already, we don't want the caller to open it and thus 
have set AVFMT_NOFILE. For live streams, we need to reload the playlist 
once per target_duration. So this doesn't work with the normal "one data 
stream input" paradigm where the stream is in s->pb. And we may want to 
parse a playlist at another url than the one the user specified, since the 
main url may contain urls to other playlists (one for each bitrate 
variant).

> > +static int applehttp_read_header(AVFormatContext *s, AVFormatParameters *ap)
> [..]
> > +    if (c->n_variants > 1 || c->variants[0]->n_segments == 0) {
> > +        char urlbuf[2048];
> > +        for (i = 0; i < c->n_variants; i++) {
> > +            make_absolute_url(urlbuf, sizeof(urlbuf), s->filename,
> > +                              c->variants[i]->url);
> > +            av_freep(&c->variants[i]->url);
> > +            c->variants[i]->url = av_strdup(urlbuf);
> > +            if ((ret = parse_playlist(c, c->variants[i]->url,
> > +                                      c->variants[i])) < 0)
> > +                goto fail;
> > +        }
> > +    }
> 
> The idea is that this should be transparent to the user. If the file
> contains a playlist, it should be handled as just any other file, and
> that should _work_, just like that. If it doesn't, it's a bug and
> should be fixed.
> 
> Therefore, this code shouldn't be necessary.

No, there's two very distinct cases. Playlists containing other playlists, 
and playlists containing segments. Items in a playlist containing segments 
should be played sequentially. Items in a playlist containing other 
playlists should be played in parallel, since they're all different 
variants of the same thing. And we need to keep all the info for each of 
the variants here (we can't open each one as a separate chained demuxer) 
in order to synchronize them properly and to be able to change streams on 
the fly.

> > +    if (c->finished) {
> > +        int duration = 0;
> > +        for (i = 0; i < c->variants[0]->n_segments; i++)
> > +            duration += c->variants[0]->segments[i]->duration;
> > +        s->duration = duration * AV_TIME_BASE;
> > +    }
> 
> Why if (finished)? Isn't it an error if it's not? (And otherwise,
> what's duration?)

If this is a _live_ stream, then finished is 0, indicating that the 
playlist isn't finalized yet, and we should poll it with regular intervals 
to find new segments. Duration is the total duration of the video (for 
seeking etc). Since there are new segments appearing all the time in a 
live stream, we don't know the total duration of the video.

> > +static int applehttp_read_packet(AVFormatContext *s, AVPacket *pkt)
> 
> This function could use some documentation / comments.
> 
> > +    for (i = 0; i < c->n_variants; i++)
> > +        c->variants[i]->needed = 0;
> > +    for (i = 0; i < s->nb_streams; i++) {
> > +        AVStream *st = s->streams[i];
> > +        struct variant *var = c->variants[s->streams[i]->id];
> > +        if (st->discard < AVDISCARD_ALL) {
> > +            var->needed = 1;
> > +            needed++;
> > +        }
> > +        var->ctx->streams[i - var->stream_offset]->discard = st->discard;
> > +    }
> 
> Why are these context variables. Why not a array of size MAXSTREAMS or
> so? They don't need to be saved across runs.

I'm not sure I understand exactly which part you meant we don't need to 
save. I guess we don't need to set the discard flags of the chained 
demuxer, we could just read out all packets from them, even if we're just 
interested in one of the streams (if e.g. receiving audio from one bitrate 
variant and video from another one) and check the discard flags ourselves 
here instead, but this feels just as simple.

> > +    if (!c->finished) {
> > +        int64_t now = av_gettime();
> > +        if (now - c->last_load_time >= c->target_duration*1000000) {
> > +            c->max_start_seq = 0;
> > +            c->min_end_seq   = INT_MAX;
> > +            for (i = 0; i < c->n_variants; i++) {
> > +                struct variant *var = c->variants[i];
> > +                if (var->needed) {
> > +                    if ((ret = parse_playlist(c, var->url, var)) < 0)
> > +                        return ret;
> > +                    c->max_start_seq = FFMAX(c->max_start_seq,
> > +                                             var->start_seq_no);
> > +                    c->min_end_seq   = FFMIN(c->min_end_seq,
> > +                                             var->start_seq_no + var->n_segments);
> 
> 
> Eh... So this is a playlist-in-a-playlist? I'm again confused, this
> should be transparent. And shouldn't this be a media file at this
> point?

We have both the url to the playlist and the media files at this point. If 
the playlist is a live stream, we need to reload the playlist to find new 
segments, though. That's what this code does.

> again, some documentation / comments would help me understand
> what the code is doing.

I'll try to add some more.

> > +    if (c->cur_seq_no < c->max_start_seq) {
> > +        av_log(NULL, AV_LOG_WARNING,
> > +               "skipping %d segments ahead, expired from playlists\n",
> > +               c->max_start_seq - c->cur_seq_no);
> > +        c->cur_seq_no = c->max_start_seq;
> > +    }
> > +    if (c->cur_seq_no >= c->min_end_seq) {
> > +        if (c->finished)
> > +            return AVERROR_EOF;
> > +        while (av_gettime() - c->last_load_time < c->target_duration*1000000) {
> > +            if (url_interrupt_cb())
> > +                return AVERROR(EINTR);
> > +            usleep(100*1000);
> > +        }
> > +        goto retry;
> > +    }
> > +    goto start;
> > +}
> 
> I personally object to the use of av_gettime() here...

For live streams, the playlists have indicated target_duration that says 
how often the playlist is updated - that's how often we should repoll the 
playlist to find new segments. We need to use a realtime clock such as 
av_gettime() here, othewise we'd be hammering the server with reloads if 
we're e.g. writing everything out to a file and not just playing it back.

> (Skipped seeking code for this cycle.)
> 
> > +    .extensions = "m3u8",
> 
> ?? Really?
> 
> Is a probe function possible based on the expected EXT-attributes in
> it that are relatively Apple-specific?

Since the we've set the AVFMT_NOFILE flag, we'll never be called with a 
probe function with data (if I read the utils.c code correctly), but I'll 
see if I can make it work in some way.

I'll post a new patch later today.

// Martin