[FFmpeg-devel] [PATCH] Altivec version of h264_idct_add

Michael Niedermayer michaelni
Sat Jun 2 12:31:04 CEST 2007


Hi

On Fri, Jun 01, 2007 at 07:59:56PM -0400, David Conrad wrote:
> On Jun 1, 2007, at 7:36 PM, Michael Niedermayer wrote:
> 
> > Hi
> >
> > On Fri, Jun 01, 2007 at 06:52:57PM -0400, David Conrad wrote:
> >> Hi,
> >>
> >> This is an updated version of ff_h264_idct_add_altivec, based on a
> >> patch by Mauricio Alvarez [1]. It's 1.9 times faster than the scalar
> >> version on my G4. Regression tests pass except for seektest, which is
> >> currently broken for me with vanilla SVN (should it work?)
> >>
> >> 170 dezicycles in ff_h264_idct_add_altivec, 1 runs, 0 skips
> >> 150 dezicycles in ff_h264_idct_add_altivec, 2 runs, 0 skips
> >> 287 dezicycles in ff_h264_idct_add_altivec, 4 runs, 0 skips
> >> 203 dezicycles in ff_h264_idct_add_altivec, 8 runs, 0 skips
> >> 131 dezicycles in ff_h264_idct_add_altivec, 16 runs, 0 skips
> >> 79 dezicycles in ff_h264_idct_add_altivec, 32 runs, 0 skips
> >> 53 dezicycles in ff_h264_idct_add_altivec, 64 runs, 0 skips
> >> 33 dezicycles in ff_h264_idct_add_altivec, 128 runs, 0 skips
> >> 23 dezicycles in ff_h264_idct_add_altivec, 256 runs, 0 skips
> >> 18 dezicycles in ff_h264_idct_add_altivec, 512 runs, 0 skips
> >> 15 dezicycles in ff_h264_idct_add_altivec, 1024 runs, 0 skips
> >> 14 dezicycles in ff_h264_idct_add_altivec, 2048 runs, 0 skips
> >> 14 dezicycles in ff_h264_idct_add_altivec, 4096 runs, 0 skips
> >> 13 dezicycles in ff_h264_idct_add_altivec, 8192 runs, 0 skips
> >> 14 dezicycles in ff_h264_idct_add_altivec, 16384 runs, 0 skips
> >> 14 dezicycles in ff_h264_idct_add_altivec, 32768 runs, 0 skips
> >> 14 dezicycles in ff_h264_idct_add_altivec, 65536 runs, 0 skips
> >
> > where where the START/STOP_TIMER placed?
> > this doesnt look correct the idct isnt being executed in 1.4 cpu  
> > cycles
> 
> Like so:
> 
> static void ff_h264_idct_add_altivec(...)
> {
>      START_TIMER
>      vec_s16_t va0, va1, va2, va3;
> [...]
>          VEC_LOAD_U8_ADD_S16_STORE_U8(dst,va3,dstperm);
>      }
>      STOP_TIMER("ff_h264_idct_add_altivec")
> }
> 
> It's my first time using these macros; where should they be?

hmm, thats ok, it seems the read_time() for ppc does not count cpu cycles
but some multiple of them so ill assume that everything is ok ...

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Everything should be made as simple as possible, but not simpler.
-- Albert Einstein
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20070602/25156e08/attachment.pgp>



More information about the ffmpeg-devel mailing list