[FFmpeg-devel] Some ideas for a tiny set of audio conversion functions..

Michael Niedermayer michaelni
Thu Nov 29 00:41:10 CET 2007


On Wed, Nov 28, 2007 at 10:34:29PM +0100, Andreas ?man wrote:
> Hello,
> 
> Michael Niedermayer wrote:
> > my original idea was to use AVFrame for audio as well
> > if we choose not to do this then you will first have to find out
> > how your AVAFrame can be used in code which should work with all
> > codec_types, that is code which wants to access
> > key_frame, pts, quality, opaque, ...
> > and how direct rendering with audio (get/release_buffer()) could work
> > with it
> 
> It really doesn't matter much to me. I'll ponder more on that later.
> 
> Something that IMO needs more thought is how to come up with the
> conversion function to use.
> 
> The parameters that may affect the choice of conversion function is:
> - Input/Output sample format
> - Input/Output number of channels
> - Mixing matrix
> - Interleaved/Planar format
> 
> We should have a generic function that can handle all cases.

for(out=0; out<out_channels; out++){
    if(in_size[out]>1){
        for(i=0; i<len; i++){
            float v=0;
            for(j=0; j<in_size[out]; j++){
                int in= in_index[j];
                float v2;
                if(in_type == TYPE_S32)
                    v2= *(int32_t*)(in_data[in] + in_linesize[in]*i);
                ...
                v+= v2*coeff[out][j];
            }
            if(out_type == TYPE_FLOAT)
                *(float*)(out_data[out] + out_linesize[out]*i)= v;
            ...
        }
    }else{
        int in= in_index[0];
        for(i=0; i<len; i++){
            float v;
            if(in_type == TYPE_S32)
                v= *(int32_t*)(in_data[in] + in_linesize[in]*i);
            ...
            v*= coeff[out][j];
            if(out_type == TYPE_FLOAT)
                *(float*)(out_data[out] + out_linesize[out]*i)= v;
            ...
        }
    }
}


> 
> Then there will be a set of special functions for common cases.
> 
> After the conversion function has been selected we will also
> know the source data scaling and biasing required by that
> particular function and conversion parameters.
> 

> Since the number of channels and mixing-matrix may change over time
> for an audio stream we must be prepared to reselect conversion
> function at any time.

this is not relevant in the design, you can always reinit the thing


> 
> Now, since the scaling+biasing is something that the codec needs to
> apply internally (at least for it to make any speed difference) we
> cannot make the audio conversion stuff completely invisible to the
> codec guts.

dont forget that nearly all codecs can do the mixing many times faster
internally as well

codecs generally do
entropy decode -(sparse vector)-> inverese decorrelation transform
for each channel (yes its oversimplified ...)

and there is no difference between doing the mixing before or after
the inverese decorrelation transform
if you now downmix you can significantly reduce the number of
transforms needed if you do the mixing before the transform


> 
> I see two variants here.
> 
> a) Get rid of the scaling + biasing entirely and let each sample
> format have an implicit normalized range.
> 
> u8       0           255
> s16     -32768       32767
> s24     -8388608     8388607  (but stored as int32)
> s32     -2147483648  2147483647
> float   -1.0         1.0
> 
> Of course, the downside is a speed decrease, especially for the
> c-variant of s16 to float.
> 
> Actually, after I composed the mail i ran some tests on my
> pentium-m decoding a vorbis file.
> 
> SSE:                 user    0m3.000s
> Prescaled C version: user    0m3.032s
> Real dummy:          user    0m4.900s
> 
> The first two are the two variants available in dsputil today.
> The third is a most simple implementation:
> 
> static void
> float_to_int16_slow(int16_t *dst, float *src, int samples)
> {
>      int i;
>      float f;
> 
>      for(i = 0; i < samples; i++) {
> 	f = *src++ * 32767.;
> 	if(f < -32768.)
> 	    f = -32768.;
> 	else if(f > 32767.)
> 	    f = 32767.;
> 	*dst++ = f;
>      }
> }

what about running 2 passes over the data
1. scale/offset
2. dsputil

what about lrintf()


> 
> Thus, it looks like option a) is really not an option.
> 
> 
> b) Keep a pointer to the conversion context in avctx and let the
> codec itself update the context when necessary.

if the convertion is done after the codec like
avcodec_decode_audio()
avcodec_mix_audio(ctx)
then the codec has no business touching anything in ctx, these 2 might
even run in seperate threads and the time at which the audio decoder
would want to change the matrix might be ong before the mixing code would
be done with the already output data

[...]

-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Many that live deserve death. And some that die deserve life. Can you give
it to them? Then do not be too eager to deal out death in judgement. For
even the very wise cannot see all ends. -- Gandalf
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20071129/8d7f3004/attachment.pgp>



More information about the ffmpeg-devel mailing list