[FFmpeg-devel] [PATCH] h264: assembly version of get_cabac for x86_64 with PIC (v4)
Roland Scheidegger
rscheidegger_lists at hispeed.ch
Sat Apr 21 07:25:21 CEST 2012
Am 21.04.2012 02:15, schrieb Michael Niedermayer:
> On Sat, Apr 21, 2012 at 02:10:46AM +0200, Michael Niedermayer wrote:
>> On Sat, Apr 21, 2012 at 01:26:54AM +0200, Michael Niedermayer wrote:
>>> On Fri, Apr 20, 2012 at 02:10:57AM +0200, Roland Scheidegger wrote:
>>>> This adds a hand-optimized assembly version for get_cabac much like the
>>>> existing one, but it works if the table offsets are RIP-relative.
>>>> Compared to the non-RIP-relative version this adds 2 lea instructions
>>>> and it needs one extra register.
>>>> There is a surprisingly large performance improvement over the c version (more
>>>> so than the generated assembly seems to suggest) just in get_cabac, I measured
>>>> roughly 40% faster for get_cabac on a K8. However, overall the difference is
>>>> not that big, I measured roughly 5% on a test clip on a K8 and a Core2.
>>>> Hopefully it still compiles on x86 32bit...
>>>> v2: incorporated feedback from Loren Merritt to avoid rip-relative movs
>>>> for every table, and got rid of unnecessary @GOTPCREL.
>>>> v3: apply similar fixes to the the decode_significance functions, and use
>>>> same macro arguments for non-pic case.
>>>> v4: prettify inline asm arguments, add a non-fast-cmov version (as I expect
>>>> the c code to be faster otherwise since both cmov and sbb suck hard on a
>>>> Prescott, even can't construct the mask with a 64bit shift as that's just as
>>>> terrible - it's quite difficult to find usable instructions on that chip...).
>>>> This is tested to work but not on a P4, in theory it _should_ be fast there.
>>>
>>> applied
>>>
>>> if someone has more ideas on how to improve it, it can easily be done
>>
>>> lets hope it doesnt fail on any odd platforms ...
>>
>> fails on darwin
>> http://fate.ffmpeg.org/log.cgi?time=20120420235051&log=compile&slot=x86_64-darwin-llvm-gcc-4.2.1
>
> ill revert it in a moment, dont want to leave compile broken until
> another solution is proposed
Oops. Any ideas what went wrong? Somehow the trick to access the table
via the difference of the table symbol and the label doesn't work? Is
that something just not supposed to work on darwin (PIC code should work
the same there right for x64?) or is this fixable? I think the only
thing I could do about is is just not enable this on darwin but surely
there has to be some way to make it work?
Roland
More information about the ffmpeg-devel
mailing list