[MPlayer-dev-eng] [PATCH] autoq support for control()

Mon Feb 11 13:56:37 CET 2002

Hi

On Monday 11 February 2002 07:23, Juan J. Sierralta P. wrote:
> On Mon, 2002-02-11 at 00:47, Michael Niedermayer wrote:
> > > 	Did you see MPEG files generated by ffmpeg on Windows Media, MTV or
> > > Java Media Studio before your simple idct ? It`s the same problem when
> >
> > no, but they should show the same stripes...
>
> 	No. Acumulations error especially on chroma component. I think this
ahh yes, i saw that too on 1 or 2 videos wth the other IDCT...

[...]
>
> > > we use a bad DCT.
> >
> > no its not, the dct can be inaccurate, it might increase filesize but
> > there will be no accumulated errors
> > for example the c dct had wrongly permutated scaling factors (that
> > certainly
>
> 	Had or has ?
had, i fixed it, anything wrong with it?

>
> > caused significantly more differences than the a correct aan dct) and it
> > was
>
> 	The problem of AAN is sometimes his strength, the idea of folding
> the last multiplications with the quantization can be hard to implement
> and make things dificult when you have multiple DCT implementations.
> That's why I said weeks ago that if we could have a common set of
> IDCT/DCT in C,MMX,XMMX, etc it would be easier and maybe faster to avoid
> diferent quant matrixes and coef permutation.
imho its faster with the permutations, its simply because these permutations 
need nearly no time
for decoding, there is no extra permutation, the decoder simply puts the 
coeffs into the correct permutation for the idct, but with a unpermutated 
idct it still has to do zigzig permutation so there is no speed win here
for encoding, its not a big deal either as the quantizer knows whch coeff is 
the last non zero and so only the coeffs up to the last non zero are being 
permutated

> 	BTW. Why SSE couldn't help on DCT/IDCT ? How much time is spent on MMX
> DCT scaling things ? Because AFAIK one of the advantges of SSE/SSE2 is
> the SIMD on floats.
yes but SSE on both the P3 & P4 need 2 cpu cycles to do 1 calculation on 4 
floats
and mmx needs 1 cpu cycle to do 1 calculation on 4 16-bit shorts, and at 
least some part of the dct/idct can be done in 16-bit
btw. the "more accurate" SSE IDCT could cause problems, because all thouse 
"shitty" players use integer IDCTs so there could be stripes and green blocks 
again ...

>
> > visible on i-frames but the problem dissapeard after a few p-frames
> > instead of getting worse and the decoder doesnt use te dct so there are
> > no differences between decoders ...
>
> 	Agree on that idea, when I put the first MMX DCT on ffmpeg I take some
> that weren't accurate and gave me acumulations errors. This is going to
> an holy war. Just to make clear, let`s review on Arpi's TODO I agree
> with him that maybe the bitstream writer could be optimized first. Then
> I think the MB skip could give good results.
btw what is draw_horiz_band() good for, it looks like its for sliced 
decoding, why isnt it used?

Michael