[FFmpeg-devel] [PATCH] libspeex Speex encoding

Justin Ruggles justin.ruggles
Tue Oct 27 23:29:03 CET 2009


Michael Niedermayer wrote:

> On Tue, Oct 27, 2009 at 05:54:51PM -0400, Justin Ruggles wrote:
>> Michael Niedermayer wrote:
>>
>>> On Sun, Oct 25, 2009 at 09:04:45AM -0400, Justin Ruggles wrote:
>>>> Hi,
>>>>
>>>> This patch combines parts of my previous libspeex encoding patch with
>>>> parts of the one sent by Art Clarke.
>>>>
>>>> The rate control is not as intuitive to use as I would like, but it
>>>> works.  libspeex has the option to have the library choose a CBR bitrate
>>>> based on a quality setting.  Providing that option doesn't really fit
>>>> well into our current system since there is no way to tell if the user
>>>> is specifying CBR quality or VBR quality.  So instead it just uses
>>>> bitrate for CBR and quality for VBR like our other audio encoders.  The
>>>> default bitrate of 64kbps is higher than the maximum Speex bitrate, so
>>>> at least it will be good quality by default.
>>> [...]
>>>> +static av_cold int encode_init(AVCodecContext *avctx)
>>>> +{
>>>> +    LibSpeexEncContext *s = avctx->priv_data;
>>>> +    const SpeexMode *mode;
>>>> +    uint8_t *header_data;
>>>> +    int header_size;
>>>> +    int32_t complexity;
>>>> +
>>>> +    /* channels */
>>>> +    if (avctx->channels < 1 || avctx->channels > 2) {
>>>> +        av_log(avctx, AV_LOG_ERROR, "Invalid channels (%d). Only stereo and "
>>>> +               "mono are supported\n", avctx->channels);
>>>> +        return -1;
>>>> +    }
>>>> +
>>>> +    /* sample rate and encoding mode */
>>>> +    switch (avctx->sample_rate) {
>>>> +    case  8000: mode = &speex_nb_mode;  break;
>>>> +    case 16000: mode = &speex_wb_mode;  break;
>>>> +    case 32000: mode = &speex_uwb_mode; break;
>>>> +    default:
>>>> +        av_log(avctx, AV_LOG_ERROR, "Sample rate of %d Hz is not supported. "
>>>> +               "Resample to 8, 16, or 32 kHz.\n", avctx->sample_rate);
>>>> +        return -1;
>>>> +    }
>>>> +
>>>> +    /* initialize libspeex */
>>>> +    s->enc_state = speex_encoder_init(mode);
>>>> +    if (!s->enc_state) {
>>>> +        av_log(avctx, AV_LOG_ERROR, "Error initializing libspeex\n");
>>>> +        return -1;
>>>> +    }
>>>> +    speex_init_header(&s->header, avctx->sample_rate, avctx->channels, mode);
>>>> +
>>>> +    /* rate control method and parameters */
>>>> +    if (avctx->flags & CODEC_FLAG_QSCALE) {
>>>> +        /* VBR */
>>>> +        s->header.vbr = 1;
>>>> +        speex_encoder_ctl(s->enc_state, SPEEX_SET_VBR, &s->header.vbr);
>>>> +        s->vbr_quality = av_clipf(avctx->global_quality / (float)FF_QP2LAMBDA,
>>>> +                                  0.0f, 10.0f);
>>>> +        speex_encoder_ctl(s->enc_state, SPEEX_SET_VBR_QUALITY, &s->vbr_quality);
>>>> +        avctx->bit_rate = 0;
>>>> +    } else {
>>>> +        /* CBR */
>>>> +        s->header.bitrate = avctx->bit_rate;
>>>> +        speex_encoder_ctl(s->enc_state, SPEEX_SET_BITRATE, &s->header.bitrate);
>>>> +        speex_encoder_ctl(s->enc_state, SPEEX_GET_BITRATE, &s->header.bitrate);
>>>> +        /* stereo side information adds about 800 bps to the base bitrate */
>>>> +        avctx->bit_rate = s->header.bitrate + (avctx->channels == 2 ? 800 : 0);
>>> avctx->bit_rate is set by the user and not the encoder
>> The reason for this is for feedback to the user.  libspeex uses the
>> closest supported bitrate for the selected mode that is less than or
>> equal to the requested bitrate.  I guess this could be taken out and
>> just let libspeex do whatever without telling the user except maybe in a
>> debug printout.  The bitrate is a nominal value anyway.  The exact
>> bitrate also depends on the number of frames per packet because frames
>> are not byte aligned, packets are.
> 
> hmm, ive no good comment/idea ATM :(

With some added tables and calculations I could come up with the closest
true bitrate based on the mode, channels, and frames per packet.  Then
print a warning message that includes what the real bitrate will be if
it is not an exact match to what the user specifies.

> 
> [...]
>>> [...]
>>>> +static int encode_frame(AVCodecContext *avctx, uint8_t *frame, int buf_size,
>>>> +                        void *data)
>>>> +{
>>>> +    LibSpeexEncContext *s = avctx->priv_data;
>>>> +    void *samples = data;
>>>> +    int nframes, i;
>>>> +
>>>> +    if (!avctx->frame_size)
>>>> +        return 0;
>>>> +
>>>> +    /* handle last packet, which may have fewer frames-per-packet and/or
>>>> +       fewer samples in the last frame */
>>>> +    nframes = s->header.frames_per_packet;
>>>> +    if (avctx->frame_size < nframes * s->header.frame_size) {
>>>> +        nframes = (avctx->frame_size + s->header.frame_size - 1) /
>>>> +                  s->header.frame_size;
>>>> +        if (avctx->frame_size != s->header.frame_size * nframes) {
>>>> +            /* allocate new buffer to pad last frame */
>>>> +            int new_samples_size;
>>>> +            avctx->frame_size = nframes * s->header.frame_size;
>>> iam not sure if this violates the API but at least i would say it is
>>> unexpected by the application
>> Hmmm. Yeah, if it doesn't violate API, it is at least not documented.
>> Is there another way to report the correct duration of the output frame
>> if the user gives, for example, 500 samples and the output frame
>> represents 640 due to padding?
> 
> decode_audio takes its input from the fields of a AVPacket
> encode_audio should produce a AVPacket (various advantages like reallocating
> a  oo small buffer make this a good idea)
> if we would return a AVPacket then there would be a duration field that
> could naturally carry this information

:)  I was both hoping and dreading you would say that.  I've been
pondering this option since we started using AVPackets for decoding.
I'll work on it.

Thanks,
Justin




More information about the ffmpeg-devel mailing list