[FFmpeg-devel] [PATCH 1/2]v2 Add macros used in opus_pvq_search to x86util.asm
Ivan Kalvachev
ikalvachev at gmail.com
Sun Aug 6 15:36:45 EEST 2017
On 8/6/17, Henrik Gramner <henrik at gramner.com> wrote:
> On Sat, Aug 5, 2017 at 9:10 PM, Ivan Kalvachev <ikalvachev at gmail.com> wrote:
>> +%macro VBROADCASTSS 2 ; dst xmm/ymm, src m32/xmm
>> +%if cpuflag(avx2)
>> + vbroadcastss %1, %2 ; ymm, xmm
>> +%elif cpuflag(avx)
>> + %ifnum sizeof%2 ; avx1 register
>> + vpermilps xmm%1, xmm%2, q0000 ; xmm, xmm, imm || ymm, ymm,
>> imm
>
> Nit: Use shufps instead of vpermilps, it's one byte shorter but
> otherwise identical in this case.
>
> c5 e8 c6 ca 00 vshufps xmm1,xmm2,xmm2,0x0
> c4 e3 79 04 ca 00 vpermilps xmm1,xmm2,0x0
It's also 1 latency cycle less on some old AMD cpu's.
Done.
>> +%macro BLENDVPS 3 ; dst/src_a, src_b, mask
>> +%if cpuflag(avx)
>> + blendvps %1, %1, %2, %3
>> +%elif cpuflag(sse4)
>> + %if notcpuflag(avx)
>> + %ifnidn %3,xmm0
>> + %error sse41 blendvps uses xmm0 as default 3d operand, you
>> used %3
>> + %endif
>> + %endif
>
> notcpuflag(avx) is redundant (it's always true since AVX uses the first
> branch).
Done.
This is a remnant from the time I had label to turn on and off
different implementations.
Best Regards
_______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Add-macros-to-x86util.asm.patch
Type: text/x-patch
Size: 4089 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20170806/6a5d83f4/attachment.bin>
More information about the ffmpeg-devel
mailing list