[FFmpeg-devel] [FFMpeg-Devel][GSoC][PATCH 3/6] postproc: Moved inline asm for packing QP to seperate function

Michael Niedermayer michaelni at gmx.at
Thu Apr 23 00:51:59 CEST 2015


On Wed, Apr 22, 2015 at 04:27:28PM -0400, Tucker DiNapoli wrote:
> This patch contains the code for the avx2/sse2 versions of the new
> function, but they are deliberately ignored, since the support for
> avx2/sse2 isn't yet present (the next commit fixes this).
> 
> This is a temporary measure until full sse2/avx2 implementation is
> complete, but it works with sse2/avx2 as inline asm. 
> 
> Moving this to a separate file would add overhead due to having to call a function,
> if this is a reasonable trade off for removing inline asm than I can eaisly do that.
> ---
>  libpostproc/postprocess_template.c | 61 +++++++++++++++++++++++++++++++-------
>  1 file changed, 51 insertions(+), 10 deletions(-)
> 
> diff --git a/libpostproc/postprocess_template.c b/libpostproc/postprocess_template.c
> index b7296c4..083be9d 100644
> --- a/libpostproc/postprocess_template.c
> +++ b/libpostproc/postprocess_template.c
> @@ -3249,7 +3249,6 @@ static inline void RENAME(prefetchnta)(const void *p)
>          : : "r" (p)
>      );
>  }
> -
>  static inline void RENAME(prefetcht0)(const void *p)
>  {
>      __asm__ volatile(   "prefetcht0 (%0)\n\t"
> @@ -3305,6 +3304,55 @@ static inline void RENAME(prefetcht2)(const void *p)
>      return;
>  }
>  #endif
> +/*
> +  This is temporary. Ultimately the inline asm should be removed completely
> +  and moved to another file (though this has some performance overhead), but for
> +  now this code is necessary.
> +  Get around the issues with inline avx by using an explicit register
> +  and simplify code by abstracting simd detail like in yasm code
> +*/
> +#if TEMPLATE_PP_MMX
> +static inline void RENAME(packQP)(PPContext c)
> +{

> +#if 0 //TEMPLATE_PP_AVX2 goes here

adding disabled code and enabling it in a subsequent commit is not ok


> +    __asm__ volatile(
> +        "vmovdqa (%1), %%ymm0\n\t"
> +        "vpermq $0, %%ymm0, %%ymm0 \n\t"
> +        "vpunpcklbw %%ymm0, %%ymm0, %%ymm0  \n\t" // 0, 0, 0, 0, 0, 0, OP, QP
> +        "vpunpcklwd %%ymm0, %%ymm0, %%ymm0  \n\t" // 0, 0, 0, 0, 0, 0, QP, QP
> +        "vpunpckldq %%ymm0, %%ymm0, %%ymm0  \n\t" //QP,...,QP
> +        "vpunpcklqdq %%ymm0, %%ymm0, %%ymm0  \n\t" //copy to upper quadword(s)
> +        "vmovdqa %%ymm0, %0"
> +        : "=m" (c.pQPb_block)
> +        : "r" (c.QP_block)
> +        : "%ymm0"
> +    );

QP_block is just 16 bytes, AVX* is not needed


[...]
> @@ -3516,6 +3564,7 @@ static void RENAME(postProcess)(const uint8_t src[], int srcStride, uint8_t dst[
>              for(qp_index=0; qp_index < (endx-startx)/BLOCK_SIZE; qp_index++){
>                  QP = QPptr[(x+qp_index*BLOCK_SIZE)>>qpHShift];
>                  nonBQP = nonBQPptr[(x+qp_index*BLOCK_SIZE)>>qpHShift];
> +
>              if(!isColor){
>                  QP= (QP* QPCorrecture + 256*128)>>16;
>                  nonBQP= (nonBQP* QPCorrecture + 256*128)>>16;

this doesnt belong in this patch


[...]

-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Its not that you shouldnt use gotos but rather that you should write
readable code and code with gotos often but not always is less readable
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 181 bytes
Desc: Digital signature
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20150423/f8bb4579/attachment.asc>


More information about the ffmpeg-devel mailing list