[FFmpeg-soc] [PATCH] xiph packetizer

Wed Aug 4 13:55:54 CEST 2010

On Tue, 3 Aug 2010, Martin Storsjö wrote:

> On Mon, 2 Aug 2010, Josh Allmann wrote:
> 
> > I don't know exactly what I did, but this round adds in support for
> > multiple Xiph frames per packet, and the tcp issue is gone. Pebkac
> > most likely.
> 
> Hmm, weird.
> 
> Now I'm getting some issues with vorbis audio with this patch. I'll debug 
> it and see what's going wrong in a while - it may just as well be 
> something in my setup, but I suspect something with multiple frames per 
> packet.

Yes, it was issues with multiple frames per packet - in the depacketizer 
actually. How did you test this - it didn't seem to work at all for me? 
After the two patches I sent to -devel, it seems to work just fine, 
though.

Except that, it seems to work in general. Relatively thorough review 
below:

> @@ -135,6 +137,14 @@ static int rtp_write_header(AVFormatContext *s1)
>              s->nal_length_size = (st->codec->extradata[4] & 0x03) + 1;
>          }
>          break;
> +    case CODEC_ID_VORBIS:
> +    case CODEC_ID_THEORA:
> +        if (!s->max_frames_per_packet) s->max_frames_per_packet = 15;
> +        s->max_frames_per_packet = av_clip(s->max_frames_per_packet, 1, 15);
> +        s->max_payload_size -= 6; // ident+frag+tdt/vdt+pkt_num+pkt_length
> +        s->num_frames = 0;
> +        if (st->codec->codec_id == CODEC_ID_VORBIS) goto defaultcase;
> +        break;

I think you could do this for both codecs - the default case sets buf_ptr,
which needs to be set for theora too.

> diff --git a/libavformat/rtpenc_xiph.c b/libavformat/rtpenc_xiph.c
> new file mode 100644
> index 0000000..989354f
> --- /dev/null
> +++ b/libavformat/rtpenc_xiph.c
> @@ -0,0 +1,117 @@
> +/*
> + * RTP packetization for Xiph audio and video
> + * Copyright (c) 2010 Josh Allmann
> + *
> + * This file is part of FFmpeg.
> + *
> + * FFmpeg is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * FFmpeg is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with FFmpeg; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
> + */
> +
> +#include "avformat.h"
> +#include "rtpenc.h"
> +
> +/**
> + * Packetize Xiph frames into RTP according to
> + * RFC 5215 (Vorbis) and the Theora RFC draft.
> + * (http://svn.xiph.org/trunk/theora/doc/draft-ietf-avt-rtp-theora-00.txt)
> + */
> +void ff_rtp_send_xiph(AVFormatContext *s1, const uint8_t *buff, int size)
> +{
> +    RTPMuxContext *s = s1->priv_data;
> +    int max_pkt_size, xdt, frag;
> +    uint8_t *q;
> +
> +    max_pkt_size = s->max_payload_size;
> +
> +    /* set xiph data type */
> +    switch (*buff) {
> +        case 0x01:   // vorbis id
> +        case 0x05:   // vorbis setup
> +        case 0x80:   // theora header
> +        case 0x82:   // theora tables
> +            xdt = 1; // packed config payload
> +            break;
> +        case 0x03:   // vorbis comments
> +        case 0x81:   // theora comments
> +            xdt = 2; // comment payload
> +            break;
> +        default:
> +            xdt = 0; // raw data payload
> +    }

The indentation of the case statements usually is at the same level
as the switch statement itself in ffmpeg. Also, a break at the end of
the default case wouldn't hurt I think.

> +
> +    /* Set ident. Must match the one in sdp.c
> +     * Probably need a non-fixed way of generating
> +     * this, but it has to be done in SDP and passed in from there. */
> +    q = s->buf;
> +    *q++ = 0xfe;
> +    *q++ = 0xcd;
> +    *q++ = 0xba;
> +
> +    /* set fragment
> +     * 0 - whole frame (possibly multiple frames)
> +     * 1 - first fragment
> +     * 2 - fragment continuation
> +     * 3 - last fragmement */
> +    frag = size <= max_pkt_size ? 0 : 1;
> +
> +    if (s->num_frames && (xdt || frag)) {
> +        /* immediately send any buffered frames
> +         * if buffer is not raw data, or if current frame is fragmented. */
> +        ff_rtp_send_data(s1, s->buf, s->buf_ptr - s->buf, 0);
> +    }

You could move this if clause to below the if (!frag && !xdt), then this
one would be only if (s->num_frames).

> +
> +    if (!frag && !xdt) { // do we have a whole frame of raw data?
> +        int remaining = max_pkt_size - ((int)(s->buf_ptr - s->buf) + size);

This calculation is slightly off - max_pkt_size already has the 6 bytes
header subtracted, but those 6 bytes still are included in s->buf_ptr - s->buf.
Also, it feels a bit convoluted.

If you happen to have a packet of e.g. 1453, this forces the lines below
to send the currently buffered data, even if there isn't any frame buffered
(s->buf_ptr - s->buf is 3 bytes, and only contains the ident header).

> +        if (remaining < 0 || s->num_frames >= s->max_frames_per_packet) {
> +            /* send previous packets now; no room for new data */
> +            ff_rtp_send_data(s1, s->buf, s->buf_ptr - s->buf, 0);
> +            s->num_frames = 0;
> +        }
> +
> +        /* buffer current frame to send later */
> +        if (0 == s->num_frames) s->timestamp = s->cur_timestamp;
> +        s->num_frames++;
> +        *q++ = s->num_frames; // set packet header

Some comment mentioning that frag and xdt should be or'ed in here, too,
but omitted since they're both 0, wouuld help the readability.

> +        if (s->num_frames > 1) q = s->buf_ptr; // jump ahead if needed
> +        *q++ = (size >> 8) & 0xff;
> +        *q++ = size & 0xff;
> +        memmove(q, buff, size);

Why memmove? I don't think there's any theoretical case where the buffers
could overlap?

> +        q += size;
> +        s->buf_ptr = q;
> +        return;
> +    }
> +
> +    s->timestamp = s->cur_timestamp;
> +    s->num_frames = 0;
> +    s->buf_ptr = q;
> +    while (size > 0) {
> +        int len = (!frag || frag == 3) ? size : max_pkt_size;
> +        q = s->buf_ptr;
> +
> +        /* set packet headers */
> +        *q++ = (frag << 6) | (xdt << 4);
> +        *q++ = (len >> 8) & 0xff;
> +        *q++ = len & 0xff;
> +        /* set packet body */
> +        memmove(q, buff, len);

Same here

> +        q += len;
> +        buff += len;
> +        size -= len;
> +
> +        ff_rtp_send_data(s1, s->buf, q - s->buf, 0);
> +
> +        frag = size <= max_pkt_size ? 3 : 2;
> +    }
> +}
> diff --git a/libavformat/rtsp.c b/libavformat/rtsp.c
> index 689ad29..1be3cd0 100644
> --- a/libavformat/rtsp.c
> +++ b/libavformat/rtsp.c
> @@ -52,7 +52,7 @@ int rtsp_default_protocols = (1 << RTSP_LOWER_TRANSPORT_UDP);
>  #define SELECT_TIMEOUT_MS 100
>  #define READ_PACKET_TIMEOUT_S 10
>  #define MAX_TIMEOUTS READ_PACKET_TIMEOUT_S * 1000 / SELECT_TIMEOUT_MS
> -#define SDP_MAX_SIZE 8192
> +#define SDP_MAX_SIZE 16384
>  

Is there any limit on the size of the extradata for theora/vorbis?
Can we be reasonably sure this is enough for at least one theora stream
plus one vorbis stream?

> diff --git a/libavformat/sdp.c b/libavformat/sdp.c
> index b34b944..acd954a 100644
> --- a/libavformat/sdp.c
> +++ b/libavformat/sdp.c
> @@ -21,6 +21,7 @@
>  #include <string.h>
>  #include "libavutil/avstring.h"
>  #include "libavutil/base64.h"
> +#include "libavcodec/xiph.h"
>  #include "avformat.h"
>  #include "internal.h"
>  #include "avc.h"
> @@ -220,6 +221,68 @@ static char *extradata2config(AVCodecContext *c)
>      return config;
>  }
>  
> +static char *xiph_extradata2config(AVCodecContext *c)
> +{
> +    char *config, *encoded_config;
> +    uint8_t *header_start[3];
> +    int headers_len, header_len[3], config_len;
> +    int first_header_size;
> +
> +    switch (c->codec_id) {
> +    case CODEC_ID_THEORA:
> +        first_header_size = 42;
> +        break;
> +    case CODEC_ID_VORBIS:
> +        first_header_size = 30;
> +        break;
> +    default:
> +        av_log(c, AV_LOG_ERROR, "Unsupported Xiph codec ID\n");
> +        return NULL;
> +    }
> +
> +    if (ff_split_xiph_headers(c->extradata, c->extradata_size,
> +                              first_header_size, header_start,
> +                              header_len) < 0) {
> +        av_log(c, AV_LOG_ERROR, "Extradata corrupt.");

Add a newline to the log message

> +        return NULL;
> +    }
> +
> +    headers_len = header_len[0]+header_len[2];

Some space around the + would make it nicer to read :-)

> +    config_len = 4 +          // count
> +                 3 +          // ident
> +                 2 +          // packet size
> +                 1 +          // header count
> +                 2 +          // header size
> +                 headers_len; // and the rest
> +    config = av_malloc(config_len);
> +    encoded_config = av_malloc(AV_BASE64_SIZE(config_len));
> +
> +    if (!config || !encoded_config) {
> +        av_log(c, AV_LOG_ERROR,
> +               "Not enough memory for configuration string\n");

If either of them was allocated, but not the other, you'd leak memory here

> +        return NULL;
> +    }
> +

// Martin