[FFmpeg-devel] [PATCH 0/2] x86: hevc_mc: port to SSSE3
Mickaël Raulet
mraulet at gmail.com
Sat Aug 23 16:07:50 CEST 2014
For 10bits and 12bits, they should stay sse4 as well because of packusdw. You need some instructions to convert it to ssse3 see below
static av_always_inline __m128i _MM_PACKUS_EPI32( __m128i a, __m128i b )
{
a = _mm_slli_epi32 (a, 16);
a = _mm_srai_epi32 (a, 16);
b = _mm_slli_epi32 (b, 16);
b = _mm_srai_epi32 (b, 16);
a = _mm_packs_epi32 (a, b);
return a;
}
Mickaël
Le 23 août 2014 à 15:22, Christophe Gisquet <christophe.gisquet at gmail.com> a écrit :
> As far as I can see, the only reason those functions are SSE4 is because
> of the pextrw needed for the following block widths:
> - 2, used only by chroma;
> - 6, used by chroma and indirectly by luma;
> - 12, used by both.
> The better solution would be to convert all chroma handling to NV12, but
> it is vastly simpler to modify the above cases to not use pextrw.
>
> This is done in 2 steps:
> - Fix width of 12 to do 8+4 instead of 6+6;
> - Modify the store macros for width 2 and 6 by passing data through
> a GPR (alas at the cost for some functions of a supplementary GPR).
>
> Christophe Gisquet (2):
> x86: hevc_mc: split differently calls
> x86: hevc_mc: convert to ssse3
>
> libavcodec/x86/hevc_mc.asm | 63 +++--
> libavcodec/x86/hevcdsp.h | 48 ++--
> libavcodec/x86/hevcdsp_init.c | 561 ++++++++++++++++++++++--------------------
> 3 files changed, 362 insertions(+), 310 deletions(-)
>
> --
> 1.9.2.msysgit.0
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
More information about the ffmpeg-devel
mailing list