[MPlayer-dev-eng] [PATCH] autoq support for control()

Alexander Werth (gmx) alexander.werth at gmx.de
Mon Feb 11 03:20:22 CET 2002


Am Son, 2002-02-10 um 23.56 schrieb D Richard Felker III:
> On Sun, Feb 10, 2002 at 11:40:43PM +0100, Michael Niedermayer wrote:
> > On Sunday 10 February 2002 23:02, Arpi wrote:
> > > as Juanjo alerady did with MC slices, we could do the same to quant/DCT,
> > > i mean doing DCT for whole slice/frame first, and then doing quantization.
> > are u sure this will be faster?
> > 512 pixels width 
> > slice height of 16 
> > [16*512 (luma) + 8*256*2 (chroma)]*sizeof(short) = 24kb
> > and the p3 has 16kb L1 data cache
> > the p4 has 8kb L1 data cache ...
> 
> the idea isn't that the whole slice fit in the cache -- although iirc
> with an amd rather than those crappy p3s and p4s, it would :) -- the
> idea is to keep both the dct and quant steps from repeatedly ruining
> cache coherency for one another. i've done optimizations like this
> before -- separating complicated operations into several smaller
> steps, even when it may require more intermediate data -- and it's
> improved performance in my experience.

Another advantage of tight loops is in the way the branch prediction of
modern cpu's work. If they can guess that a loop has to be taken another
time they can start filling their pipeline with instructions. If a loop
is just executed 10 times before another route is taken the pipelines
and all precalculated date must be cleared and the pipeline filled anew.
This will be even more important with processors like the p4 with long
pipelines.
And keep in mind that there is an instruction cache also. It's usually
not a problem but when external libraries are called this can fill up
the code cache pretty quick.
Alexander Werth

-- 
The right to read is a battle being fought today...
http://www.gnu.org/philosophy/right-to-read.html



More information about the MPlayer-dev-eng mailing list