[FFmpeg-devel] Intrinsics (and NEON in particular)

Pascal Massimino pascal.massimino at gmail.com
Wed Sep 3 14:06:39 CEST 2014


Reimar,


On Wed, Sep 3, 2014 at 9:16 AM, Reimar Döffinger <Reimar.Doeffinger at gmx.de>
wrote:

> On 03.09.2014, at 08:38, Pascal Massimino <pascal.massimino at gmail.com>
> wrote:
> > On Tue, Sep 2, 2014 at 10:26 PM, Reimar Döffinger <
> Reimar.Doeffinger at gmx.de>
> > wrote:
> >
> >> On 03.09.2014, at 00:49, Pascal Massimino <pascal.massimino at gmail.com>
> >> wrote:
> >>> On Tue, Sep 2, 2014 at 9:39 AM, Michael Niedermayer <michaelni at gmx.at>
> >>> wrote:
> >>>
> >>>
> >>> [ahem: ffmpeg doesn't feel like using intrinsics, by chance?]
> >>
> >> I tried that about 5 months back, once more.
> >> It still results in code that is slower than the plain C version, even
> >> when using SIMD, on trivial NEON audio format conversion (same thing in
> asm
> >> was about 8x faster).
> >> So you can get the same effect with less effort by disabling just
> >> disabling asm code.
> >>
> >
> > strange. I exclusively used intrinsics for libwebp (x86, but also
> > neon/aarch64) and was pretty
> > pleased with the result (say <2% perf loss, but 10x easier maintenance
> and
> > friendliness to non-guru contributors).
>
> I guess you never used uint16x8x2 and similar types then, because almost
> any access to them seems to go via the stack.
> See the last file of
> http://lists-archives.com/mplayer-dev-eng/38036-add-neon-optimizations-to-some-critical-audio-functions.html
> , it spilled the data to stack twice per loop iteration.
>

indeed, i just tried to compile the patch (gcc 4.8.3) and the output is
rather bad.
It's likely related to the poor support of post-incremented instructions.
I've noticed
that in several occasions.
But on the bright side, things seems to be moving in the right direction,
e.g.:
https://gcc.gnu.org/ml/gcc-patches/2014-06/msg00122.html

/skal


More information about the ffmpeg-devel mailing list