[FFmpeg-devel] [PATCH 3/3] x86/vp9lpf: use fewer instructions in SPLATB_MIX

James Almer jamrial at gmail.com
Mon Aug 4 19:35:14 CEST 2014


On 04/08/14 2:03 PM, Clément Bœsch wrote:
> On Sun, Aug 03, 2014 at 11:53:40PM -0300, James Almer wrote:
>> Signed-off-by: James Almer <jamrial at gmail.com>
>> ---
>>  libavcodec/x86/vp9lpf.asm | 5 ++---
>>  1 file changed, 2 insertions(+), 3 deletions(-)
>>
>> diff --git a/libavcodec/x86/vp9lpf.asm b/libavcodec/x86/vp9lpf.asm
>> index c5db0ca..def7d5a 100644
>> --- a/libavcodec/x86/vp9lpf.asm
>> +++ b/libavcodec/x86/vp9lpf.asm
>> @@ -302,9 +302,8 @@ SECTION .text
>>      pshufb     %1, %2
>>  %else
>>      punpcklbw  %1, %1
>> -    punpcklqdq %1, %1
>> -    pshuflw    %1, %1, 0
>> -    pshufhw    %1, %1, 0x55
>> +    punpcklwd  %1, %1
>> +    punpckldq  %1, %1
> 
> IIRC I based this on what I found in x86util.asm:SPLATW; would that apply
> there as well?

No, SPLATW splats a word to the entire register whereas this splats one byte 
to the lower half of the register and another to the upper half.

The code above turns AB into AAAAAAAABBBBBBBB while SPLATW (with implied 0 as 
third argument) should output ABABABABABABABAB.


More information about the ffmpeg-devel mailing list