[FFmpeg-devel] [PATCH] move h264 loopfilter strength code to yasm
Fri Sep 24 23:19:31 CEST 2010
On Fri, Sep 24, 2010 at 11:15:49PM +0200, Michael Niedermayer wrote:
> On Fri, Sep 24, 2010 at 03:20:49PM -0400, Ronald S. Bultje wrote:
> > Hi,
> > On Fri, Sep 24, 2010 at 12:26 PM, Daniel Verkamp <daniel at drv.nu> wrote:
> > > On Fri, Sep 24, 2010 at 9:04 AM, Ronald S. Bultje <rsbultje at gmail.com> wrote:
> > >> So removing pand (which doesn't do anything in the one case, and can
> > >> be replaced by a pxor in the other). With the attached patch #2, I get
> > >> this:
> > >> /var/folders/Rz/RzQTCSLsFPWQeOEO5EXsJE+++TI/-Tmp-//cc8uAjPS.s:315:bad
> > >> register name `%%mm0'
> > >> /var/folders/Rz/RzQTCSLsFPWQeOEO5EXsJE+++TI/-Tmp-//cc8uAjPS.s:520:bad
> > >> register name `%%mm0'
> > >>
> > >> What does that mean?
> > >
> > > If you omit all of the optional colon-separated arguments to asm, the
> > > % symbols before register names in the asm no longer need to be
> > > escaped with a second % (I suppose since there can be no substitution
> > > when there are no operand constraints). ?You can add an empty : or
> > > just drop the doubled % to avoid this.
> > OK, that fixes it. Oddly, it's the same speed, even though
> > #instructions is less. OK, so next then. Attached patch is supposed to
> > be part of a patch that decreases the insane amount of registers used
> > for temporary stuff that could be loaded directly (so instead of doing
> > (%0) where %0="m"(var[idx1]), use (%0,%1) with %0="r"(var) and
> > %1="r"(idx1). This works and is not slower (eventually it will be
> > faster when it saves a few registers, this is work-in-progress.
> > The second patch ("test") tries to use d_idx as a global (which it is,
> > in effect). Why doesn't this work?
> > - "por (%0,%1), %%mm1 \n" // nnz[b] || nnz[bn]
> > + "por %1(%0), %%mm1 \n" // nnz[b] || nnz[bn]
> > ::"r"(nnz+b_idx),
> > - "r"(d_idx)
> > + "g"(d_idx)
> for %1(%0)
> %1 must be a constant, it is not in the code so this cannot work
> Either you have a for loop then this needs to be a register or you
> can manuallay unroll it then it can be a constant
> thats a limitation of x86 as you know ;)
> The case where unrolling is left to gcc and gcc then would choose depending on
> this between register and constant can probably done with av_builtin_constant_p
> but that would be a huge mess i susoect and really not a good idea
using "m" as loren suggests leave more freedom to gcc to mess up but if gcc
doesnt then theres nothing wrong with it
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
I know you won't believe me, but the highest form of Human Excellence is
to question oneself and others. -- Socrates
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 198 bytes
Desc: Digital signature
More information about the ffmpeg-devel