[FFmpeg-devel] avfilter/x86/vf_hflip : make macro and add AVX2

Martin Vignali martin.vignali at gmail.com
Thu Dec 14 11:49:04 EET 2017


2017-12-13 17:18 GMT+01:00 Henrik Gramner <henrik at gramner.com>:

> On Wed, Dec 13, 2017 at 6:07 AM, Martin Vignali
> <martin.vignali at gmail.com> wrote:
> > +        vpermq  m1, [srcq + xq -     mmsize + %3], 0x4e; flip each lane
> at load
> > +        vpermq  m2, [srcq + xq - 2 * mmsize + %3], 0x4e; flip each lane
> at load
>
> Would doing 2x 128-bit movu + 2x vinserti128 be faster?
>
>
> Hello,

Seems to be slower for me (patch in attach, maybe i made something wrong)

With vpermq :
hflip_byte_c: 29.2
hflip_byte_ssse3: 28.4
hflip_byte_avx2: 20.2
hflip_short_c: 29.2
hflip_short_ssse3: 28.4
hflip_short_avx2: 20.2

With movu + vinserti128 :
hflip_byte_c: 29.2
hflip_byte_ssse3: 28.2
hflip_byte_avx2: 22.7
hflip_short_c: 29.7
hflip_short_ssse3: 28.2
hflip_short_avx2: 21.7

Martin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-avfilter-x86-vf_hflip-merge-hflip-byte-and-hflip-sho.patch
Type: application/octet-stream
Size: 2779 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20171214/3a9c989c/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-avfilter-x86-vf_hflip-add-avx2-version-for-hflip_byt.patch
Type: application/octet-stream
Size: 3260 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20171214/3a9c989c/attachment-0001.obj>


More information about the ffmpeg-devel mailing list