[FFmpeg-devel] [PATCH 2/2] x86: hevc_mc: convert to ssse3

James Almer jamrial at gmail.com
Sat Aug 23 16:52:32 CEST 2014


On 23/08/14 10:22 AM, Christophe Gisquet wrote:
> The only sse4 instruction is pextrw, which is used on rather minor
> functions for small blocks. Therefore use whichever GPR is available
> to extract the output word.
> 
> Before (sse4), for block_w == 6:
> 4627 decicycles in epel_uni, 16377 runs, 7 skips
> 7422 decicycles in epel_bi, 65501 runs, 35 skips
> 
> After:
> 4649 decicycles in epel_uni, 16371 runs, 13 skips
> 7432 decicycles in epel_bi, 65505 runs, 31 skips
> ---
>  libavcodec/x86/hevc_mc.asm    |  63 +++--
>  libavcodec/x86/hevcdsp.h      |  48 ++--
>  libavcodec/x86/hevcdsp_init.c | 522 +++++++++++++++++++++---------------------
>  3 files changed, 323 insertions(+), 310 deletions(-)
> 
> diff --git a/libavcodec/x86/hevc_mc.asm b/libavcodec/x86/hevc_mc.asm
> index e2236ec..eb61b18 100644
> --- a/libavcodec/x86/hevc_mc.asm
> +++ b/libavcodec/x86/hevc_mc.asm
> @@ -52,9 +52,9 @@ hevc_epel_filters_%4_%1 times %2 d%3 -2, 58
>  
>  
>  
> -EPEL_TABLE  8, 8, b, sse4
> -EPEL_TABLE 10, 4, w, sse4
> -EPEL_TABLE 12, 4, w, sse4
> +EPEL_TABLE  8, 8, b, ssse3
> +EPEL_TABLE 10, 4, w, ssse3
> +EPEL_TABLE 12, 4, w, ssse3
>  
>  %macro QPEL_TABLE 4
>  hevc_qpel_filters_%4_%1 times %2 d%3  -1,  4
> @@ -71,13 +71,13 @@ hevc_qpel_filters_%4_%1 times %2 d%3  -1,  4
>                          times %2 d%3   4, -1
>  %endmacro
>  
> -QPEL_TABLE  8, 8, b, sse4
> -QPEL_TABLE 10, 4, w, sse4
> -QPEL_TABLE 12, 4, w, sse4
> +QPEL_TABLE  8, 8, b, ssse3
> +QPEL_TABLE 10, 4, w, ssse3
> +QPEL_TABLE 12, 4, w, ssse3

Do these need to be duplicated? You could just remove the suffix and let 
every version of the function use the same tables.


More information about the ffmpeg-devel mailing list