[MPlayer-dev-eng] Regarding x264 decoding optimizations

Gautham Anil gautham_anil at yahoo.com
Sat Jan 26 00:50:02 CET 2008


> The problem with this methodology is that your
> divide the decoding
> into an arbitrary number of threads (one for each
> stage to take your
> analogy), each of these threads accounting for a
> different % of
> decoding time (say CABAC would be 15%, deblocking
> 20% on a certain
> sample, and 20%, 18% on another).
> The problem with this aproach is that it doesn't
> scale with the number
> of CPUs (i.e. you inherently have a fixed number of
> threads to
> distribute to your CPUs, so you won't get any
> speed-up if you have
> more CPUs than threads).
> 
> The "right" way to do it is frame-level multi
> threaded decoding (just
> like x264 does for encoding), as Loren has said
> several times on
> ffmpeg-dev mailing list (Google is your friend if
> you want the
> details).
> Sadly, implementing this is clearly uneasy, that's
> why it has never
> been done up to now. :-
> 
> Guillaume

True, but what I was hoping for is that if 1080p video
decodes using 2 threads without any jerks in the
slowest quad core system, why would more parallelism
be a critical objective?

I cannot comment on Xeons, but the cheapest Core 2
Quad cpu in newegg has 2.4GHz per core. Knowing the
capability of a single 1.6GHz core with 2MB cache, I
am pretty sure if the decoding is split across 2
2.4GHz cores with 4MB each, the Quad will play most
1080p videos smoothly. So why worry about more
threads?

Moreover, if there are plenty of cores to spare, the
lookahead decoding recommended by peter can be
implemented. Once the user specifies the number of
extra GOPs to handle, the spare cores can start
working on them.

A little more about decoding on 2 cores, if the
decoding stage splits into a pipeline of 3 or more
uneven stages, the unevenness wont be a big issue
since the free core will switch between the threads as
necessary.

It all depends on relative coding time. I don't know
how long frame level parallelism is expected to take
to code or even how long this pipelining might take.
Is If frame level parallelism is expected in a few
months, honestly, I'll just sit tight and wait for it.
But if it takes a long time like over a year, and this
pipelining is small enough that someone might
volunteer to implement it, I'd rather have something
quick that gives me limited features (why does that
statement remind me of Windows? :-) ).

If as Reimar Döffinger says, decoding cannot be
pipelined, then well, lookahead decoding still looks
promising. Looks very simple to implement and with
enough memory will smooth out playback substantially.
If there are X frame buffers, then we can have as many
threads as there are key-frames in the next X frames.
Also, if we have buffers, we can have a "PAUSED.
Decoding to buffers. Playback will resume when buffer
full" message rather than frame skipping.

Thanks for listening...

--
Gautham Anil


      ____________________________________________________________________________________
Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ 




More information about the MPlayer-dev-eng mailing list