[FFmpeg-devel] [FFMpeg-Devel][GSoC][PATCH 3/6] postproc: Moved inline asm for packing QP to seperate function
Michael Niedermayer
michaelni at gmx.at
Thu Apr 23 00:51:59 CEST 2015
On Wed, Apr 22, 2015 at 04:27:28PM -0400, Tucker DiNapoli wrote:
> This patch contains the code for the avx2/sse2 versions of the new
> function, but they are deliberately ignored, since the support for
> avx2/sse2 isn't yet present (the next commit fixes this).
>
> This is a temporary measure until full sse2/avx2 implementation is
> complete, but it works with sse2/avx2 as inline asm.
>
> Moving this to a separate file would add overhead due to having to call a function,
> if this is a reasonable trade off for removing inline asm than I can eaisly do that.
> ---
> libpostproc/postprocess_template.c | 61 +++++++++++++++++++++++++++++++-------
> 1 file changed, 51 insertions(+), 10 deletions(-)
>
> diff --git a/libpostproc/postprocess_template.c b/libpostproc/postprocess_template.c
> index b7296c4..083be9d 100644
> --- a/libpostproc/postprocess_template.c
> +++ b/libpostproc/postprocess_template.c
> @@ -3249,7 +3249,6 @@ static inline void RENAME(prefetchnta)(const void *p)
> : : "r" (p)
> );
> }
> -
> static inline void RENAME(prefetcht0)(const void *p)
> {
> __asm__ volatile( "prefetcht0 (%0)\n\t"
> @@ -3305,6 +3304,55 @@ static inline void RENAME(prefetcht2)(const void *p)
> return;
> }
> #endif
> +/*
> + This is temporary. Ultimately the inline asm should be removed completely
> + and moved to another file (though this has some performance overhead), but for
> + now this code is necessary.
> + Get around the issues with inline avx by using an explicit register
> + and simplify code by abstracting simd detail like in yasm code
> +*/
> +#if TEMPLATE_PP_MMX
> +static inline void RENAME(packQP)(PPContext c)
> +{
> +#if 0 //TEMPLATE_PP_AVX2 goes here
adding disabled code and enabling it in a subsequent commit is not ok
> + __asm__ volatile(
> + "vmovdqa (%1), %%ymm0\n\t"
> + "vpermq $0, %%ymm0, %%ymm0 \n\t"
> + "vpunpcklbw %%ymm0, %%ymm0, %%ymm0 \n\t" // 0, 0, 0, 0, 0, 0, OP, QP
> + "vpunpcklwd %%ymm0, %%ymm0, %%ymm0 \n\t" // 0, 0, 0, 0, 0, 0, QP, QP
> + "vpunpckldq %%ymm0, %%ymm0, %%ymm0 \n\t" //QP,...,QP
> + "vpunpcklqdq %%ymm0, %%ymm0, %%ymm0 \n\t" //copy to upper quadword(s)
> + "vmovdqa %%ymm0, %0"
> + : "=m" (c.pQPb_block)
> + : "r" (c.QP_block)
> + : "%ymm0"
> + );
QP_block is just 16 bytes, AVX* is not needed
[...]
> @@ -3516,6 +3564,7 @@ static void RENAME(postProcess)(const uint8_t src[], int srcStride, uint8_t dst[
> for(qp_index=0; qp_index < (endx-startx)/BLOCK_SIZE; qp_index++){
> QP = QPptr[(x+qp_index*BLOCK_SIZE)>>qpHShift];
> nonBQP = nonBQPptr[(x+qp_index*BLOCK_SIZE)>>qpHShift];
> +
> if(!isColor){
> QP= (QP* QPCorrecture + 256*128)>>16;
> nonBQP= (nonBQP* QPCorrecture + 256*128)>>16;
this doesnt belong in this patch
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
Its not that you shouldnt use gotos but rather that you should write
readable code and code with gotos often but not always is less readable
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 181 bytes
Desc: Digital signature
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20150423/f8bb4579/attachment.asc>
More information about the ffmpeg-devel
mailing list