[Ffmpeg-devel] patch: altivec optimizations for h264 decoder
Romain Dolbeau
romain
Tue Feb 7 18:25:42 CET 2006
Mauricio Alvarez <alvarez at ac.upc.edu> wrote:
> Why do you think so?. This algorithm has more instructions than the
> factorized-matrix that is implemented in the C version but it can take
> more advantage of the altivec instructions by reducing the data
> reorganization (matrix transpose and so on).
on some code, gcc sucks ; one of the case is when you have a lot of
independant instructions, gcc will happily schedule them together, and
then won't be able to allocate the registers. I've seen it a lot of
times when fully unrolling loop in AltiVec ; IIRC, there's even code
somewhere in ffmpeg (or is it mplayer) where I had to hand-allocate the
registers to avoid spilling.
Of course, maybe newer gcc fixed this, but I doubt it. IBM's xlc is
doing well on the same code, but can't compile most of ffmpeg and/or
mplayer.
> Well, the problem here is with the h264_qpel4_mc22_altivec function which
> passes to qpel4_hv_lowpass_altivec the value 4 as a stride for the tmp
> array.
In that case, fix the comments to specify the actual assumptions ;
sometimes the code is right and the comments are wrong...
> I have not tested this, I only added put_pixels8_altivec because
> put_h264_qpel8_mc00_altivec requires it. May be it is slower that the C
> version I'm not sure, I am going to make a deeper analysis of this.
you really should check before this can be commited ; it might be faster
to have put_pixels8_altivec be the C version. AltiVec has a big problem
with unaligned store, you really need to win big somewhere else to
offset the penalty.
> BTW I was trying to implement put_pixels16_l2_altivec and
> put_pixels8_l2_altivec using the vec_avg instruction, but always I found
> evident artifacts in the resulting videos. Has you any clue about that? I
> think that it is possible to achieve more speed-up by implementing those
> functions in altivec.
vec_avg should be identical to dsputil's rnd_avg32 IIRC, so the
functions should be doable w/o extra overhead. What kind of artifact did
you notice ? smears and streaks are usually caused by wrong load/store
(in altivec, usually misaligned load/store).
--
Romain Dolbeau
<romain at dolbeau.org>
More information about the ffmpeg-devel
mailing list