[MPlayer-dev-eng] [RFC] disable fastmemcpy on x86-64 by default

Sun May 27 23:38:57 CEST 2007

On Sun, 27 May 2007 23:11:45 +0200
Michael Niedermayer <michaelni at gmx.at> wrote:

> > Interesting are benchmark 2 and 5, which both are faster with
> > the patch. 

I missed here benchmark 4, which is also slightly faster.
Interesting to note: benchmark 2 and 5 are faster in the VC(!)
bechmark 4 in the VO. Which suggest that benchmark 2 and 5 use dr.
(couldn't see anything in the -v log, and i'm to tired to check the code)

> hmm, theres something odd ...
> where is this code using any memcpy at all?
> doesnt mga vo always use mem2agpcpy() ?
> it seems the patch disabled this and uses plain memcpy() for it

Yes, from libvo/fast_memcpy.h:

---schnipp---
#ifdef USE_FASTMEMCPY
[...]
#else /* USE_FASTMEMCPY */
#define mem2agpcpy(a,b,c) memcpy(a,b,c)
#endif
---schnapp---

And due to the patch, USE_FASTMEMCPY isn't set anymore.

> and MIN_LEN is 2k and mem2agpcpy is just done per line which is
> less then 2k so it practically falls back to rep movsb

I tried a series of files which had shorter line sizes than
the ones liste, none of which showed any speedup. But i didn't
check for any use of dr.

> this is in impressive series of bugs ...
> 
> if my hypothesis is correct then reimar will have some work to do ;)

BTW: is there any reason why fast_memcpy&co are not defined inline?
I even think that in the case w/o runtime cpu detection, we could
just do a #define fast_memcpy fast_memcpy_XXX (where XXX is one of MMX,SSE...)
and get completely rid not only of a call, but of a stack frame too.

				Attila Kinali

-- 
Linux ist... wenn man einfache Dinge auch mit einer kryptischen
post-fix Sprache loesen kann
                        -- Daniel Hottinger