[FFmpeg-devel] [PATCH] some SIMD write-combining for h264

Måns Rullgård mans
Sun Jan 17 22:44:01 CET 2010


Michael Niedermayer <michaelni at gmx.at> writes:

> On Sun, Jan 17, 2010 at 04:59:55AM -0500, Alexander Strange wrote:
>> 
>> On Jan 16, 2010, at 9:59 PM, M?ns Rullg?rd wrote:
>> 
>> > Alexander Strange <astrange at ithinksw.com> writes:
>> > 
>> >> On Jan 16, 2010, at 12:35 AM, Michael Niedermayer wrote:
>> >> 
>> >>> On Fri, Jan 15, 2010 at 11:11:23PM -0500, Alexander Strange wrote:
>> >>>> This adds intreadwrite macros for 64/128-bit memory operations and uses them in h264.
>> >>>> 
>> >>>> Unlike the other macros, these assume correct alignment, and the patch only defines the ones there was an immediate use for.
>> >>>> This only has x86 versions, but others should be easy. The 64-bit operations can be done with double copies on most systems, I guess.
>> >>>> 
>> >>>> Decoding a 30s file on Core 2 Merom with --cpu=core2 (minimum of 5 runs):
>> >>>> x86-32: 12.72s before, 12.51s after (1.7%)
>> >>>> x86-64: 10.24s before, 10.20s after (.4%)
>> >>>> 
>> >>>> Tested on x86-32, x86-64, x86-32 with --arch=c.
>> >>> 
>> >>> as your code uses MMX you need to at least mention EMMS/float issue in the
>> >>> dox and probably a  emms_c(); call before draw_horiz_band()
>> >>> dunno if these are all
>> >> 
>> >> Added in the comment.
>> >> 
>> >>> also what sets __MMX__ ? we have our own defines for that
>> >> 
>> >> It's a gcc builtin define, set based on ./configure --cpu=x adding
>> >> -march.  HAVE_MMX is for the build and not the host cpu family, and
>> >> this is inlined asm, so it can't use it.
>> > 
>> > Huh?  Host... build???
>> 
>> Oh, that was supposed to be "target"...
>> Anyway, this is MMX being used like the cmov/clz inlines, so it depends on the given --cpu and not on the build system's capabilities.
>
> do all compilers set __MMX__ ?

All that support gcc inline asm do AFAIK.

>> +static inline void AV_COPY64(void *d, const void *s)
>
> are you sure this should not be always_inline?
> 2 plain 32bit reads +writes likely beat pushing 2 pointers on the stack and
> calling pulling 2 pointers off 2 mmx for copy and return 

Yes, all of those should be always_inline.  Seems like I missed that...

-- 
M?ns Rullg?rd
mans at mansr.com



More information about the ffmpeg-devel mailing list