[FFmpeg-devel] [PATCH] wmadec.c: SIMD optimization using float_to_int16_interleave

Michael Niedermayer michaelni
Tue Mar 9 11:06:07 CET 2010


On Mon, Mar 08, 2010 at 04:23:55PM +0000, M?ns Rullg?rd wrote:
> "Zhou Zongyi"<zhouzy at os.pku.edu.cn> writes:
> 
> > Hi all,
> >
> > Here is my patch. I tested decoding a 160kbps 44.1kHz sample on a
> > K8, around 10% faster in overall speed.
> >
> > Index: libavcodec/wmadec.c
> > ===================================================================
> > --- libavcodec/wmadec.c (revision 22281)
> > +++ libavcodec/wmadec.c (working copy)
> > @@ -769,8 +769,6 @@
> >  static int wma_decode_frame(WMACodecContext *s, int16_t *samples)
> >  {
> >      int ret, i, n, ch, incr;
> > -    int16_t *ptr;
> > -    float *iptr;
> >
> >  #ifdef TRACE
> >      tprintf(s->avctx, "***decode_frame: %d size=%d\n", s->frame_count++, s->frame_len);
> > @@ -790,17 +788,29 @@
> >      /* convert frame to integer */
> >      n = s->frame_len;
> >      incr = s->nb_channels;
> > -    for(ch = 0; ch < s->nb_channels; ch++) {
> > -        ptr = samples + ch;
> > -        iptr = s->frame_out[ch];
> > +    if (s->dsp.float_to_int16_interleave == ff_float_to_int16_interleave_c) {
> > +        for(ch = 0; ch < incr; ch++) {
> > +            int16_t *ptr = samples + ch;
> > +            float *iptr = s->frame_out[ch];
> >
> > -        for(i=0;i<n;i++) {
> > -            *ptr = av_clip_int16(lrintf(*iptr++));
> > -            ptr += incr;
> > +            for(i=0;i<n;i++) {
> > +                *ptr = av_clip_int16(lrintf(*iptr));
> > +                ptr += incr;
> > +                /* prepare for next block */
> > +                iptr[0] = iptr[n];
> > +                iptr++;
> > +            }
> >          }
> > -        /* prepare for next block */
> > -        memmove(&s->frame_out[ch][0], &s->frame_out[ch][s->frame_len],
> > -                s->frame_len * sizeof(float));
> > +    } else {
> > +        float *output[MAX_CHANNELS];
> > +        for (ch = 0; ch < MAX_CHANNELS; ch++)
> > +            output[ch] = s->frame_out[ch];
> > +        s->dsp.float_to_int16_interleave(samples, (const float **)output, n, incr);
> > +        for(ch = 0; ch < incr; ch++) {
> > +            /* prepare for next block */
> > +            memmove(&s->frame_out[ch][0], &s->frame_out[ch][n],
> > +                    n * sizeof(float));
> > +        }
> >      }
> 
> This is way too hackish IMO.  The patch also looks more complicated
> than it needs to be.

ive not reviewed the patch as iam waiting for a clean version without
cosmetic changes, but nothing of this size that makes a codec 10% can
be too hackish

please make h264 10% faster with a hackish change of that size ... ;)

[...]

-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Why not whip the teacher when the pupil misbehaves? -- Diogenes of Sinope
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100309/71bf7ef0/attachment.pgp>



More information about the ffmpeg-devel mailing list