[FFmpeg-devel] [PATCH 3/9] x86: simple_idct_put: 10bits versions

Michael Niedermayer michael at niedermayer.cc
Sat Oct 10 13:42:26 CEST 2015


On Fri, Oct 09, 2015 at 11:53:40PM +0200, Christophe Gisquet wrote:
> Modeled from the prores version. Clips to [0;1023] and is bitexact.
> Bitexactness requires to add an offset in a different place compared
> to prores or C, and makes the function approximately 2% slower.
> 
> For 16 frames of a DNxHD 4:2:2 10bits test sequence:
> 
> C:    60861 decicycles in idct, 1048205 runs,    371 skips
> sse2: 27567 decicycles in idct, 1048216 runs,    360 skips
> avx:  26272 decicycles in idct, 1048171 runs,    405 skips
> ---
>  libavcodec/x86/Makefile                   |  1 +
>  libavcodec/x86/idctdsp_init.c             | 16 ++++++++++
>  libavcodec/x86/simple_idct.h              |  3 ++
>  libavcodec/x86/simple_idct10.asm          | 53 +++++++++++++++++++++++++++++++
>  libavcodec/x86/simple_idct10_template.asm | 12 +++++++
>  5 files changed, 85 insertions(+)
>  create mode 100644 libavcodec/x86/simple_idct10.asm
> 
> diff --git a/libavcodec/x86/Makefile b/libavcodec/x86/Makefile
> index a9d8032..ef7628e 100644
> --- a/libavcodec/x86/Makefile
> +++ b/libavcodec/x86/Makefile
> @@ -126,6 +126,7 @@ YASM-OBJS-$(CONFIG_QPELDSP)            += x86/qpeldsp.o                 \
>                                            x86/fpel.o                    \
>                                            x86/qpel.o
>  YASM-OBJS-$(CONFIG_RV34DSP)            += x86/rv34dsp.o
> +YASM-OBJS-$(CONFIG_IDCTDSP)            += x86/simple_idct10.o
>  YASM-OBJS-$(CONFIG_VIDEODSP)           += x86/videodsp.o
>  YASM-OBJS-$(CONFIG_VP3DSP)             += x86/vp3dsp.o
>  YASM-OBJS-$(CONFIG_VP8DSP)             += x86/vp8dsp.o                  \
> diff --git a/libavcodec/x86/idctdsp_init.c b/libavcodec/x86/idctdsp_init.c
> index 2c26a98..17ddc9e 100644
> --- a/libavcodec/x86/idctdsp_init.c
> +++ b/libavcodec/x86/idctdsp_init.c
> @@ -85,4 +85,20 @@ av_cold void ff_idctdsp_init_x86(IDCTDSPContext *c, AVCodecContext *avctx,
>          c->put_pixels_clamped        = ff_put_pixels_clamped_sse2;
>          c->add_pixels_clamped        = ff_add_pixels_clamped_sse2;
>      }
> +
> +    if (ARCH_X86_64 &&
> +        avctx->bits_per_raw_sample == 10 && avctx->lowres == 0 &&
> +        (avctx->idct_algo == FF_IDCT_AUTO ||
> +         avctx->idct_algo == FF_IDCT_SIMPLEAUTO ||
> +         avctx->idct_algo == FF_IDCT_SIMPLE)) {
> +        if (EXTERNAL_SSE2(cpu_flags)) {
> +            c->idct_put  = ff_simple_idct10_put_sse2;
> +            c->perm_type = FF_IDCT_PERM_TRANSPOSE;

perm_type represents the permutation for idct_put, idct_add and idct
setting just one of them risks having a wrong permutation for the
other 2
if some cases are unused they could be set to NULL to avoid hard to
debug artifacts if they become used though setting the to a matching
idct seems more correct

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

DNS cache poisoning attacks, popular search engine, Google internet authority
dont be evil, please
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20151010/80085cdc/attachment.sig>


More information about the ffmpeg-devel mailing list