[FFmpeg-devel] [PATCH] avcodec/alac: also use a temp buffer for 24bit samples

Tue Oct 6 22:17:43 CEST 2015

On 10/6/2015 4:40 PM, Paul B Mahol wrote:
> On 10/6/15, James Almer <jamrial at gmail.com> wrote:
>> Since AVFrame.extended_data is apparently not padded, simd functions
>> could in some cases overread, so make the decoder use a temp buffer
>> unconditionally.
>>
>> Signed-off-by: James Almer <jamrial at gmail.com>
>> ---
>>  libavcodec/alac.c | 18 +++++-------------
>>  1 file changed, 5 insertions(+), 13 deletions(-)
>>
>> diff --git a/libavcodec/alac.c b/libavcodec/alac.c
>> index 146668e..394bd19 100644
>> --- a/libavcodec/alac.c
>> +++ b/libavcodec/alac.c
>> @@ -80,7 +80,6 @@ typedef struct ALACContext {
>>      int extra_bits;     /**< number of extra bits beyond 16-bit */
>>      int nb_samples;     /**< number of samples in the current frame */
>>
>> -    int direct_output;
>>      int extra_bit_bug;
>>
>>      ALACDSPContext dsp;
>> @@ -278,10 +277,6 @@ static int decode_element(AVCodecContext *avctx,
>> AVFrame *frame, int ch_index,
>>          return AVERROR_INVALIDDATA;
>>      }
>>      alac->nb_samples = output_samples;
>> -    if (alac->direct_output) {
>> -        for (ch = 0; ch < channels; ch++)
>> -            alac->output_samples_buffer[ch] = (int32_t
>> *)frame->extended_data[ch_index + ch];
>> -    }
>>
>>      if (is_compressed) {
>>          int16_t lpc_coefs[2][32];
>> @@ -393,8 +388,9 @@ static int decode_element(AVCodecContext *avctx, AVFrame
>> *frame, int ch_index,
>>          break;
>>      case 24: {
>>          for (ch = 0; ch < channels; ch++) {
>> +            int32_t *outbuffer = (int32_t *)frame->extended_data[ch_index +
>> ch];
>>              for (i = 0; i < alac->nb_samples; i++)
>> -                alac->output_samples_buffer[ch][i] <<= 8;
>> +                *outbuffer++ = alac->output_samples_buffer[ch][i] << 8;
>>          }}
>>          break;
>>      }
>> @@ -468,8 +464,7 @@ static av_cold int alac_decode_close(AVCodecContext
>> *avctx)
>>      int ch;
>>      for (ch = 0; ch < FFMIN(alac->channels, 2); ch++) {
>>          av_freep(&alac->predict_error_buffer[ch]);
>> -        if (!alac->direct_output)
>> -            av_freep(&alac->output_samples_buffer[ch]);
>> +        av_freep(&alac->output_samples_buffer[ch]);
>>          av_freep(&alac->extra_bits_buffer[ch]);
>>      }
>>
>> @@ -491,11 +486,8 @@ static int allocate_buffers(ALACContext *alac)
>>          FF_ALLOC_OR_GOTO(alac->avctx, alac->predict_error_buffer[ch],
>>                           buf_size, buf_alloc_fail);
>>
>> -        alac->direct_output = alac->sample_size > 16;
>> -        if (!alac->direct_output) {
>> -            FF_ALLOC_OR_GOTO(alac->avctx, alac->output_samples_buffer[ch],
>> -                             buf_size, buf_alloc_fail);
>> -        }
>> +        FF_ALLOC_OR_GOTO(alac->avctx, alac->output_samples_buffer[ch],
>> +                         buf_size, buf_alloc_fail);
>>
>>          FF_ALLOC_OR_GOTO(alac->avctx, alac->extra_bits_buffer[ch],
>>                           buf_size, buf_alloc_fail);
>> --
>> 2.5.2
>>
>> _______________________________________________
>> ffmpeg-devel mailing list
>> ffmpeg-devel at ffmpeg.org
>> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>>
> 
> it should be padded and not introduce slowdown

If you mean the temp buffers, they will be padded alongside the simd functions
once i commit them.
But If you mean the avframe.extended_data buffer, could you take care of that?
I'm not familiar enough with avframe to change the relevant alloc functions.

running "time ffmpeg -v 0 -threads 1 -i INPUT -threads 1 -f null -" (implicit
pcm_s16le output)

Before
real    0m0.596s
user    0m0.000s
sys     0m0.000s

After
real    0m0.575s
user    0m0.000s
sys     0m0.000s

running "time ffmpeg -v 0 -threads 1 -i INPUT -threads 1 -c:a pcm_s24le -f null -"

Before
real    0m0.618s
user    0m0.000s
sys     0m0.000s

After
real    0m0.618s
user    0m0.000s
sys     0m0.000s

With a ~1 minute 24 bit 44.1kh stereo sample. Curious that it's faster when the
output is s16.
You'll probably have to do the same for the tak decoder before you commit your
decorrelate simd patch, btw. It also uses avframe.extended_data buffer directly
for 24bit samples.