[MPlayer-dev-eng] [PATCH]Add support for CoreAVC h264 codec

Michael Niedermayer michaelni at gmx.at
Sat Oct 7 00:17:00 CEST 2006


Hi

On Thu, Oct 05, 2006 at 12:19:18PM -0600, Loren Merritt wrote:
> On Thu, 5 Oct 2006, Michael Niedermayer wrote:
> >On Thu, Oct 05, 2006 at 10:58:58AM +0200, Guillaume Poirier wrote:
> >>
> >>I had a look at CABAC's code, and the greatest problem I saw with it
> >>was in the algorithm itself. CABAC is an inherently serial (which
> >>quite easily leads to pipeline bubbles) algorithm that processes
> >>_bits_. ! That means that i.e. a 700Mb file requires  700*1024*1024*8
> >>runs through cabac.
> >>Maybe one sort of optimization that I could see would be to make sure
> >>there's no pathological code such as partial register stall, etc...,
> >>because every pipeline stall will really hurt performance because
> >>nothing else can be processed during this time (as the algorithm is so
> >>serial).
> >
> >one insane idea is to do CABAC decoding of 2 frames at once (patches
> >welcome)
> 
> I considered that, but I don't see how it could work. The core cabac 
> functions can't run in isolation, they need to be interleaved with the 
> rest of the decoding process to tell which contexts to decode under, and 
> the rest of the decoder needs the results from cabac in order to move on. 
> So in order to decabac 2 frames at once, you'd need to run 2 decoder 
> threads, and sync them at every cabac read.
> Unless... not normal threads, but some purely userspace thread 
> implementation without preemption (and which only saves the nonvolatile 
> registers). One call to cabac switches to the other thread, a second call 
> computes both, saves one result, and returns the other.
> You'd still need to separate bitstream parsing from motion compensation 
> since one frame will depend on the other. But I think CoreAVC does that 
> already, as part of its SMP model.

seperating bitstream parsing from mc would be worth a try for ffh264 too
the pressure on the code cache, branch prediction related things, and 
maybe even data cache should be reduced that way
one question though is what should the 1 pass output? blocks after idct,
blocks before idct or some compressed run-level bytestream ...
either way the data should use a stride=4 or 8 for blocks before MC to 
improve cache locality
speaking of cache locality, there was also this crazy paper about using
framebuffers organized in macroblocks instead of pixels, this might have
some significant effect on the data cache but be a little nightmare
for the motion compensation and would make direct rendering impossible

sadly iam a little unmotivated at the moment to work on any of this :(
so dont hesitate if you want to try any of that
ill maybe try to get something like the above working for a simple
codec like mpeg1 ...

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

In the past you could go to a library and read, borrow or copy any book
Today you'd get arrested for mere telling someone where the library is



More information about the MPlayer-dev-eng mailing list