[FFmpeg-devel] [PATCH] move h264 loopfilter strength code to yasm

Måns Rullgård mans
Fri Sep 24 21:36:48 CEST 2010


"Ronald S. Bultje" <rsbultje at gmail.com> writes:

> Hi,
>
> On Fri, Sep 24, 2010 at 12:26 PM, Daniel Verkamp <daniel at drv.nu> wrote:
>> On Fri, Sep 24, 2010 at 9:04 AM, Ronald S. Bultje <rsbultje at gmail.com> wrote:
>>> So removing pand (which doesn't do anything in the one case, and can
>>> be replaced by a pxor in the other). With the attached patch #2, I get
>>> this:
>>> /var/folders/Rz/RzQTCSLsFPWQeOEO5EXsJE+++TI/-Tmp-//cc8uAjPS.s:315:bad
>>> register name `%%mm0'
>>> /var/folders/Rz/RzQTCSLsFPWQeOEO5EXsJE+++TI/-Tmp-//cc8uAjPS.s:520:bad
>>> register name `%%mm0'
>>>
>>> What does that mean?
>>
>> If you omit all of the optional colon-separated arguments to asm, the
>> % symbols before register names in the asm no longer need to be
>> escaped with a second % (I suppose since there can be no substitution
>> when there are no operand constraints). ?You can add an empty : or
>> just drop the doubled % to avoid this.
>
> OK, that fixes it. Oddly, it's the same speed, even though
> #instructions is less. OK, so next then. Attached patch is supposed to
> be part of a patch that decreases the insane amount of registers used
> for temporary stuff that could be loaded directly (so instead of doing
> (%0) where %0="m"(var[idx1]), use (%0,%1) with %0="r"(var) and
> %1="r"(idx1). This works and is not slower (eventually it will be
> faster when it saves a few registers, this is work-in-progress.

Why are you spending time and effort trying to find a magical piece of
C code that gcc does what you want with?  It would be much simpler to
write the code as you want it (yasm or inline) directly.  The next gcc
release will break this anyway.

-- 
M?ns Rullg?rd
mans at mansr.com



More information about the ffmpeg-devel mailing list