[MPlayer-dev-eng] [PATCH] autoq support for control()

Sun Feb 10 23:56:44 CET 2002

On Sun, Feb 10, 2002 at 11:40:43PM +0100, Michael Niedermayer wrote:
> Hi
> 
> On Sunday 10 February 2002 23:02, Arpi wrote:
> [...]
> > > so here is a quick TODO list...
> > > - sliced mode like libmpeg2 (for both encoding & decoding)
> > > - try the AAN DCT in MMX as alternative fast but inaccurate choice
> > > - optimize the bitstream writer
> > > - template stuff to avoid a few if()
> >
> > i think the bitstream writer optim should be the first, as it's relative
> > simple, and doesn't need architectural changes.
> >
> > instead of doing template stuff, we may could try to localize most critical
> > if()'s, and try to separate only them. yes, it isn't 100% solution yet, but
> > may help a bit without big mess.
> imho most are equally critical, as they are executed once per block :(

perhaps libavcodec could be modified to use template-style functions,
included once per mpeg variant, with #ifdefs insteads of ifs. this is
rather ugly imho, but it may be better than being slow (as it is now)
or having multiple copies of all the code.

> > as Juanjo alerady did with MC slices, we could do the same to quant/DCT,
> > i mean doing DCT for whole slice/frame first, and then doing quantization.
> are u sure this will be faster?
> 512 pixels width 
> slice height of 16 
> [16*512 (luma) + 8*256*2 (chroma)]*sizeof(short) = 24kb
> and the p3 has 16kb L1 data cache
> the p4 has 8kb L1 data cache ...

the idea isn't that the whole slice fit in the cache -- although iirc
with an amd rather than those crappy p3s and p4s, it would :) -- the
idea is to keep both the dct and quant steps from repeatedly ruining
cache coherency for one another. i've done optimizations like this
before -- separating complicated operations into several smaller
steps, even when it may require more intermediate data -- and it's
improved performance in my experience.

rich