[FFmpeg-devel] [Patch][OpenHEVC]added ASM DBF functions
James Almer
jamrial at gmail.com
Fri May 16 19:58:50 CEST 2014
On 16/05/14 6:47 AM, Pierre Edouard Lepere wrote:
> Hi,
>
> Here's a patch with the changes you suggested. However, I think that the luma is still ssse3 dependant.
>
> Regards,
> Pierre-Edouard Lepere
>
> %macro LUMA_DEBLOCK_BODY 2
> - movdqa m9, m2
> + mova m9, m2
> psllw m9, 1; *2
> - movdqa m10, m1
> + mova m10, m1
> psubw m10, m9
> paddw m10, m3
> - pabsw m10, m10 ; 0dp0, 0dp3 , 1dp0, 1dp3
> + ABS1 m10, m10 ; 0dp0, 0dp3 , 1dp0, 1dp3
>
> - movdqa m9, m5
> + mova m9, m5
Unlike with PABSW, the second argument for ABS1 is a temp register that's used
for the SSE2 case. The SSSE3 case uses the first argument twice when it expands
to pabsw.
In this one you could for example use m9 or m11 since they are going to be
overwritten by later instructions.
And after replacing all the pabsw with ABS1/PABSW luma can work with SSE2 alone.
It's a matter of duplicating the functions for each instruction set by using a
macro, plus the necessary additions in hevcdps_init.c
Anyway, to avoid further postponing the committing of this code, i can send a
patch to address the above after this makes it to the tree.
I already wrote it to test after all.
Regards.
More information about the ffmpeg-devel
mailing list