[FFmpeg-devel] [PATCH 1/2] avcodec: loongson optimized h264pred with mmi v2

周晓勇 zhouxiaoyong at loongson.cn
Thu Aug 6 04:21:57 CEST 2015


this is just another implement using C1 float registers, and the patch make functions more readable.
i think using C1 registers may reduce the load of general registers.
gsldlc1 and gsldrc1 are similar to ldl and ldr only different with which register to use.


在2015-08-06 05:29:58,周晓勇<zhouxiaoyong at loongson.cn>写道:
> Hi,
> 
> On Tue, Aug 4, 2015 at 8:05 AM, 周晓勇 <zhouxiaoyong at loongson.cn> wrote:
> 
> > From 71478e642fac00b12b313723ee83acdfef732fd1 Mon Sep 17 00:00:00 2001
> > From: ZhouXiaoyong <zhouxiaoyong at loongson.cn>
> > Date: Tue, 4 Aug 2015 16:28:02 +0800
> > Subject: [PATCH 1/2] avcodec: loongson optimized h264pred with mmi v2
> >
> >
> > Signed-off-by: ZhouXiaoyong <zhouxiaoyong at loongson.cn>
> > ---
> >  libavcodec/mips/h264pred_init_mips.c |   1 -
> >  libavcodec/mips/h264pred_mips.h      |   7 +-
> >  libavcodec/mips/h264pred_mmi.c       | 459
> > +++++++++++++++++------------------
> >  3 files changed, 226 insertions(+), 241 deletions(-)
> 
>  [..]
> 
> > void ff_pred16x16_vertical_8_mmi(uint8_t *src, ptrdiff_t stride)
> >  {
> >      __asm__ volatile (
> > -        "dsubu $2, %0, %1                   \r\n"
> > -        "daddu $3, %0, $0                   \r\n"
> > -        "ldl $4, 7($2)                      \r\n"
> > -        "ldr $4, 0($2)                      \r\n"
> > -        "ldl $5, 15($2)                     \r\n"
> > -        "ldr $5, 8($2)                      \r\n"
> > -        "dli $6, 0x10                       \r\n"
> > +        "dli $8, 16                         \r\n"
> > +        "gsldlc1 $f2, 7(%[srcA])            \r\n"
> > +        "gsldrc1 $f2, 0(%[srcA])            \r\n"
> > +        "gsldlc1 $f4, 15(%[srcA])           \r\n"
> > +        "gsldrc1 $f4, 8(%[srcA])            \r\n"
> >          "1:                                 \r\n"
> > -        "sdl $4, 7($3)                      \r\n"
> > -        "sdr $4, 0($3)                      \r\n"
> > -        "sdl $5, 15($3)                     \r\n"
> > -        "sdr $5, 8($3)                      \r\n"
> > -        "daddu $3, %1                       \r\n"
> > -        "daddiu $6, -1                      \r\n"
> > -        "bnez $6, 1b                        \r\n"
> > -        ::"r"(src),"r"(stride)
> > -        : "$2","$3","$4","$5","$6","memory"
> > +        "gssdlc1 $f2, 7(%[src])             \r\n"
> > +        "gssdrc1 $f2, 0(%[src])             \r\n"
> > +        "gssdlc1 $f4, 15(%[src])            \r\n"
> > +        "gssdrc1 $f4, 8(%[src])             \r\n"
> > +        "daddu %[src], %[src], %[stride]    \r\n"
> > +        "daddi $8, $8, -1                   \r\n"
> > +        "bnez $8, 1b                        \r\n"
> > +        : [src]"+&r"(src)
> > +        : [stride]"r"(stride),[srcA]"r"(src-stride)
> > +        : "$8","$f2","$f4"
> >      );
> >  }
> 
> 
> So... I'm confused. You're replacing one type of optimizations with
> another. What happened? Was the old optimization bad? Was it for an old cpu
> type and is yours for a newer one? Something else?
> 
> Ronald
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel




More information about the ffmpeg-devel mailing list