[FFmpeg-devel] avfilter/x86/vf_blend : add avx2 version for 8b func (WIP)
Martin Vignali
martin.vignali at gmail.com
Thu Dec 14 12:16:54 EET 2017
2017-12-13 17:37 GMT+01:00 Henrik Gramner <henrik at gramner.com>:
> On Sat, Dec 9, 2017 at 1:11 PM, Martin Vignali <martin.vignali at gmail.com>
> wrote:
> > the idea in AVX2 is to load 128bits of data (2x 64 bits)
> > then shuffle accross lane, the two 64 bits in the low part of each lane,
> to
> > keep the rest of the process similar
> > to the sse version
>
> What about using pmovzxbw instead of movu + vpermq + punpcklbw?
>
You're right, this is faster (tested on the first one with intermediate
16bits processing (grainextract)
vpermq load
grainextract_c: 22162.2
grainextract_sse2: 1160.9
grainextract_avx2: 1154.2
vpmovzxbw
grainextract_c: 22165.7
grainextract_sse2: 1155.7
grainextract_avx2: 772.9
>
> > for the store, the idea is similar in the opposite way (shuffle before
> > store)
>
> You could also do vextracti128 + 128-bit packuswb instead of 256-bit
> packuswb + vpermq.
>
>
Sorry don't understand this part
do you mean 128 bit packuswb + movh for each lane ?
or something else ?
Martin
More information about the ffmpeg-devel
mailing list