[FFmpeg-devel] [PATCH] lavc/aarch64: add a few SIMD function for AAC PS

Clément Bœsch u at pkh.me
Thu Jun 1 12:13:19 EEST 2017


On Thu, May 25, 2017 at 01:22:22PM -0300, James Almer wrote:
[...]
> > +function ff_ps_stereo_interpolate_ipdopd_neon, export=1
> > +        movrel      x5, ipdopd_factors
> > +        ld1         {v20.4S}, [x5]
> > +        ld1         {v0.4S,v1.4S}, [x2]
> > +        ld1         {v6.4S,v7.4S}, [x3]
> > +1:
> > +        ld1         {v2.2S}, [x0]
> > +        ld1         {v3.2S}, [x1]
> > +        dup         v2.2D, v2.D[0]
> > +        dup         v3.2D, v3.D[0]
> > +        fadd        v0.4S, v0.4S, v6.4S
> > +        fadd        v1.4S, v1.4S, v7.4S
> > +        zip1        v16.4S, v0.4S, v0.4S
> > +        zip2        v17.4S, v0.4S, v0.4S
> > +        zip1        v18.4S, v1.4S, v1.4S
> > +        zip2        v19.4S, v1.4S, v1.4S
> > +        fmul        v4.4S, v2.4S, v16.4S
> > +        fmla        v4.4S, v3.4S, v17.4S
> > +        ext         v2.16B, v2.16B, v2.16B, #4
> > +        ext         v3.16B, v3.16B, v3.16B, #4
> 
> > +        fmul        v5.4S, v2.4S, v18.4S
> > +        fmla        v5.4S, v3.4S, v19.4S
> > +        fmla        v4.4S, v5.4S, v20.4S
> 
> You could make ipdopd_factors be 0, INT32_MIN, 0, INT32_MIN then replace
> the fmla with eor + fadd.
> No idea if that will actually be faster, though.

I'll check that when benchmarking.

Here is a new version adding ff_ps_hybrid_analysis_neon.

It was very fun to write, but there are some weirdness:

- filter lane is 8 but we're reading only 6 (I suppose that for
  performance though)

- this part is strange:

        INT64FLOAT sum_re = (INT64FLOAT)filter[i][6][0] * in[6][0];
        INT64FLOAT sum_im = (INT64FLOAT)filter[i][6][0] * in[6][1];

  why isn't it using the im part of the filter for sum_im?

I'll post some benchmark later.

-- 
Clément B.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-lavc-aarch64-add-a-few-SIMD-function-for-AAC-PS.patch
Type: text/x-diff
Size: 14974 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20170601/0d1c3bb6/attachment.patch>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20170601/0d1c3bb6/attachment.sig>


More information about the ffmpeg-devel mailing list