[FFmpeg-devel] [PATCH 0/2] x86: hevc_mc: port to SSSE3

Mickaël Raulet mraulet at gmail.com
Sat Aug 23 16:07:50 CEST 2014


For 10bits and 12bits, they should stay sse4 as well because of packusdw. You need some instructions to convert it to ssse3 see below


static av_always_inline __m128i _MM_PACKUS_EPI32( __m128i a, __m128i b )
{
     a = _mm_slli_epi32 (a, 16);
     a = _mm_srai_epi32 (a, 16);
     b = _mm_slli_epi32 (b, 16);
     b = _mm_srai_epi32 (b, 16);
     a = _mm_packs_epi32 (a, b);
    return a;
}

Mickaël



Le 23 août 2014 à 15:22, Christophe Gisquet <christophe.gisquet at gmail.com> a écrit :

> As far as I can see, the only reason those functions are SSE4 is because
> of the pextrw needed for the following block widths:
> - 2, used  only by chroma;
> - 6, used by chroma and indirectly by luma;
> - 12, used by both.
> The better solution would be to convert all chroma handling to NV12, but
> it is vastly simpler to modify the above cases to not use pextrw.
> 
> This is done in 2 steps:
> - Fix width of 12 to do 8+4 instead of 6+6;
> - Modify the store macros for width 2 and 6 by passing data through
>  a GPR (alas at the cost for some functions of a supplementary GPR).
> 
> Christophe Gisquet (2):
>  x86: hevc_mc: split differently calls
>  x86: hevc_mc: convert to ssse3
> 
> libavcodec/x86/hevc_mc.asm    |  63 +++--
> libavcodec/x86/hevcdsp.h      |  48 ++--
> libavcodec/x86/hevcdsp_init.c | 561 ++++++++++++++++++++++--------------------
> 3 files changed, 362 insertions(+), 310 deletions(-)
> 
> -- 
> 1.9.2.msysgit.0
> 
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel



More information about the ffmpeg-devel mailing list