[FFmpeg-devel] [PATCH] some SIMD write-combining for h264

Alexander Strange astrange
Sun Jan 24 06:58:47 CET 2010

On Jan 18, 2010, at 5:34 AM, Alexander Strange wrote:

> On Jan 17, 2010, at 8:27 PM, M?ns Rullg?rd wrote:
>> Alexander Strange <astrange at ithinksw.com> writes:
>>> On Sun, Jan 17, 2010 at 7:54 PM, Carl Eugen Hoyos <cehoyos at ag.or.at> wrote:
>>>> Alexander Strange <astrange <at> ithinksw.com> writes:
>>>>>>>> also what sets __MMX__ ? we have our own defines for that
>>>>>>> It's a gcc builtin define, set based on ./configure --cpu=x adding
>>>>>>> -march.  HAVE_MMX is for the build and not the host cpu family, and
>>>>>>> this is inlined asm, so it can't use it.
>>>>>> Huh?  Host... build???
>>>>> Oh, that was supposed to be "target"...
>>>>> Anyway, this is MMX being used like the cmov/clz inlines, so it depends on the
>>>>> given --cpu and not on the build system's capabilities.
>>>> Could you explain once more why this shouldn't be HAVE_MMX?
>>>> If the user passes --disable-mmx to configure, he imo expects MMX to be disabled.
>>>> Carl Eugen
>>> HAVE_MMX isn't enough to enable it - './configure --cpu=i586' enables
>>> HAVE_MMX, but i586 doesn't have it.
>> Not anymore.
> I think that was wrong. --cpu is the minimum required cpu, not the only expected cpu, but that turned off building dsputil mmx which is runtime-cpudetected. (even if runtime-cpudetection is disabled, actually)
> Besides, './configure' also sets HAVE_MMX but targets generic x86-32 which might not have it.
>>> Technically I'd say --disable-mmx should pass -mno-mmx to gcc, but
>>> that seems like a complicated change to configure, so I'll check
>>> HAVE_MMX to disable it as well.
>> That's almost trivial to arrange.  Do we want that?
> The other new architectures would have to be added as configure options (sse[234] are missing), but if you want then sure.
> Applied the first patch with #if HAVE_MMX around the file and av_always_inline.
> The second patch needs to be rewritten now since the decoder changed under it.

Here it is, with some more opportunities I found.
The original file now goes from 12.236s to 11.862s on x86-32 (3% faster).

I ended up lifting some address calculation out of loops - the asm statements make every compiler I tried recalculate the whole address every time. This added some warnings about pointer casts, although the pointer math works out:

libavcodec/h264.h:1316:37: warning: incompatible pointer types initializing 'int16_t [2]', expected 'int16_t (*)[2]'
            int16_t (*mvd_dst)[2] = h->mvd_table[list][b_xy];
libavcodec/h264.h:1317:37: warning: incompatible pointer types initializing 'int16_t [2]', expected 'int16_t (*)[2]'
            int16_t (*mvd_src)[2] = h->mvd_cache[list][scan8[0]];

Did I get it wrong or should I just add type casts?

-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-H.264-Use-64-128-bit-write-combining-macros-for-copi.patch
Type: application/octet-stream
Size: 13838 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100124/1dfcb355/attachment.obj>

More information about the ffmpeg-devel mailing list