[FFmpeg-devel] [PATCH 1/2] avcodec: loongson optimized h264pred with mmi v2
周晓勇
zhouxiaoyong at loongson.cn
Thu Aug 6 04:21:57 CEST 2015
this is just another implement using C1 float registers, and the patch make functions more readable.
i think using C1 registers may reduce the load of general registers.
gsldlc1 and gsldrc1 are similar to ldl and ldr only different with which register to use.
在2015-08-06 05:29:58,周晓勇<zhouxiaoyong at loongson.cn>写道:
> Hi,
>
> On Tue, Aug 4, 2015 at 8:05 AM, 周晓勇 <zhouxiaoyong at loongson.cn> wrote:
>
> > From 71478e642fac00b12b313723ee83acdfef732fd1 Mon Sep 17 00:00:00 2001
> > From: ZhouXiaoyong <zhouxiaoyong at loongson.cn>
> > Date: Tue, 4 Aug 2015 16:28:02 +0800
> > Subject: [PATCH 1/2] avcodec: loongson optimized h264pred with mmi v2
> >
> >
> > Signed-off-by: ZhouXiaoyong <zhouxiaoyong at loongson.cn>
> > ---
> > libavcodec/mips/h264pred_init_mips.c | 1 -
> > libavcodec/mips/h264pred_mips.h | 7 +-
> > libavcodec/mips/h264pred_mmi.c | 459
> > +++++++++++++++++------------------
> > 3 files changed, 226 insertions(+), 241 deletions(-)
>
> [..]
>
> > void ff_pred16x16_vertical_8_mmi(uint8_t *src, ptrdiff_t stride)
> > {
> > __asm__ volatile (
> > - "dsubu $2, %0, %1 \r\n"
> > - "daddu $3, %0, $0 \r\n"
> > - "ldl $4, 7($2) \r\n"
> > - "ldr $4, 0($2) \r\n"
> > - "ldl $5, 15($2) \r\n"
> > - "ldr $5, 8($2) \r\n"
> > - "dli $6, 0x10 \r\n"
> > + "dli $8, 16 \r\n"
> > + "gsldlc1 $f2, 7(%[srcA]) \r\n"
> > + "gsldrc1 $f2, 0(%[srcA]) \r\n"
> > + "gsldlc1 $f4, 15(%[srcA]) \r\n"
> > + "gsldrc1 $f4, 8(%[srcA]) \r\n"
> > "1: \r\n"
> > - "sdl $4, 7($3) \r\n"
> > - "sdr $4, 0($3) \r\n"
> > - "sdl $5, 15($3) \r\n"
> > - "sdr $5, 8($3) \r\n"
> > - "daddu $3, %1 \r\n"
> > - "daddiu $6, -1 \r\n"
> > - "bnez $6, 1b \r\n"
> > - ::"r"(src),"r"(stride)
> > - : "$2","$3","$4","$5","$6","memory"
> > + "gssdlc1 $f2, 7(%[src]) \r\n"
> > + "gssdrc1 $f2, 0(%[src]) \r\n"
> > + "gssdlc1 $f4, 15(%[src]) \r\n"
> > + "gssdrc1 $f4, 8(%[src]) \r\n"
> > + "daddu %[src], %[src], %[stride] \r\n"
> > + "daddi $8, $8, -1 \r\n"
> > + "bnez $8, 1b \r\n"
> > + : [src]"+&r"(src)
> > + : [stride]"r"(stride),[srcA]"r"(src-stride)
> > + : "$8","$f2","$f4"
> > );
> > }
>
>
> So... I'm confused. You're replacing one type of optimizations with
> another. What happened? Was the old optimization bad? Was it for an old cpu
> type and is yours for a newer one? Something else?
>
> Ronald
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
More information about the ffmpeg-devel
mailing list