[FFmpeg-devel] [PATCH] SPARC VIS simple_idct try#7
Balatoni Denes
dbalatoni
Thu Aug 30 02:13:20 CEST 2007
Hi Michael!
Just a question (and a half question):
Thursday 30 August 2007 01:25-kor Michael Niedermayer ezt ?rta:
> > + /* 3. column */\
> > + "3: \n\t"\
> > + "for %%f8, %%f10, %%f60 \n\t"\
> > + "fcmpd %%fcc0, %%f62, %%f60 \n\t"\
>
> the for and fcmp can similarely be moved up, you have to switch to fcc1
> though to avoid a conflict with the above ones
> this applies to the other for/fcmpd as well
Why do I have to switch to fcc1, there is plenty of space to place the fcmpds
without conflict ? Also checking for equality is %fcc0.
>
> [...]
>
> > + TRANSPOSE
> > + IDCT4ROWS
> > + SCALEROWS
> > + PUTPIXELSCLAMPED("0")
> > + LOAD("%2+64")
> > + TRANSPOSE
> > + IDCT4ROWS
> > + SCALEROWS
> > + PUTPIXELSCLAMPED("4")
>
> the SCALEROWS is unneeded, the fpack16 can do the downshift and a single
> addition to the 0,0 coefficient before the idct or first column after the
> transpose can compensate for the rounding difference
Indeed, I missed this. However that one add has to be after multiplication -
because while in the C simple idct all coefficients are multiplied by
1/sqrt(2), here they are not (correct me if I am wrong, but this is slightly
more accurate imho).
bye
Denes
More information about the ffmpeg-devel
mailing list