[FFmpeg-devel] [PATCH] SPARC VIS simple_idct try#7

Balatoni Denes dbalatoni
Thu Aug 30 02:13:20 CEST 2007


Hi Michael!

Just a question (and a half question):

Thursday 30 August 2007 01:25-kor Michael Niedermayer ezt ?rta:
> > +    /* 3. column */\
> > +        "3:                             \n\t"\
> > +        "for %%f8, %%f10, %%f60         \n\t"\
> > +        "fcmpd %%fcc0, %%f62, %%f60     \n\t"\
>
> the for and fcmp can similarely be moved up, you have to switch to fcc1
> though to avoid a conflict with the above ones
> this applies to the other for/fcmpd as well

Why do I have to switch to fcc1, there is plenty of space to place the fcmpds 
without conflict ? Also checking for equality is %fcc0.

>
> [...]
>
> > +        TRANSPOSE
> > +        IDCT4ROWS
> > +        SCALEROWS
> > +        PUTPIXELSCLAMPED("0")
> > +        LOAD("%2+64")
> > +        TRANSPOSE
> > +        IDCT4ROWS
> > +        SCALEROWS
> > +        PUTPIXELSCLAMPED("4")
>
> the SCALEROWS is unneeded, the fpack16 can do the downshift and a single
> addition to the 0,0 coefficient before the idct or first column after the
> transpose can compensate for the rounding difference

Indeed, I missed this. However that one add has to be after multiplication - 
because while in the C simple idct all coefficients are multiplied by 
1/sqrt(2), here they are not (correct me if I am wrong, but this is slightly 
more accurate imho).

bye
Denes




More information about the ffmpeg-devel mailing list