[FFmpeg-devel] libavutil simd
Michael Niedermayer
michaelni
Tue Oct 2 20:15:42 CEST 2007
Hi
On Tue, Oct 02, 2007 at 05:29:18PM +0200, Luca Barbato wrote:
> Michael Niedermayer wrote:
> > personally iam in favor of the simplest and least hackish solution
> > for x86 we can easily and cleanly figure out the cpu capabilities
> >
> > for ppc and sparc there is no simple and clean way, what really is the
> > big problem with treating ppc without altivec like a different
> > architecture than ppc with altivec?
> > with x86 we cant as there are so many different variants (mmx, 3dnow, mmx2
> > sse, sse2, ....)
> >
> > the whole thread seems to be centered around "we must do it at runtime
> > no matter what" i just cant help but keep wondering why that is so
> > important?
>
> certain binary distributors may have yet another headache about that...
since when do we care about certain binary distributions ;)
also they can patch the code as they want which they do anyway, and they
have the advantage of not having to solve the detection for more than 1 OS
[...]
> >
> > PS: let me remind everyone that libavutil is supposed to be LIGHTweight,
> > simple, modular and fast
> > and i really would rather drop SIMD in libavutil completely before we
> > fill it up with some of the idiotic hacks suggested by the army of
> > bloated zombies in this thread. Many of you really sound like win32
> > users who want their ideas implemented no matter how stupid
>
> having a 4 times faster adler doesn't sound stupid if you are going to
> use it quite often...
you are comparing naive C code against optimized altivec
you can easily work with 4 bytes at a time, try something like
for(...){
ss0=ss1=sum0=sum1=0;
for(...){
s= ((uint64_t*)src)[i];
a0= s & 0x00FF00FF00FF00FF;
a1= (s>>8) & 0x00FF00FF00FF00FF;
ss0 += sum0;
ss1 += sum1;
sum0 += a0;
sum1 += a1;
}
tmp= sum0 + sum1;
X += (0x0001000100010001*tmp)>>48;
// 3000000020000000100000000
ss0 += ss1; // 33000000220000001100000000
ss0= (ss0&0x0000FFFF0000FFFF) + ((ss0>>16)&0x0000FFFF0000FFFF); // 33002200220011001100000000
ss0+= ss0>>32; // 33222222221111111100000000
ss0&= 0xFFFFFFFF;
Y += 8*ss0; // 24,16,8,0
Y += (0x0001000300050007*sum0 + 0x0002000400060008*sum1)>>48; // 876543218765432187654321
}
(totally untested and iam certain it does contain some bugs its just to
demonstrate how it can be done and yes it can be opimized further)
> same goes for sha1, md5, aes...
yes
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
When you are offended at any man's fault, turn to yourself and study your
own failings. Then you will forget your anger. -- Epictetus
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20071002/d0f3d5a1/attachment.pgp>
More information about the ffmpeg-devel
mailing list