[MPlayer-dev-eng] Regarding x264 decoding optimizations

Vladimir Mosgalin mosgalin at VM10124.spb.edu
Wed Jan 30 01:18:43 CET 2008


Hi Gautham Anil!

 On 2008.01.29 at 12:12:54 -0800, Gautham Anil wrote next:

> This algorithm will process multiple GOPs in a
> parallel non blocking way. Since someone with 2G
> should be able to use around 150 buffers, I think the

There are no problems with having 2gb or more on desktop, however
somehow the thought of media player using such amounts of ram really
scares me..

Anyway I don't get your calculations. Let's assume 1920x1080 video (I
hardly believe that modern multicore pc with several gigs of ram has any
troubles playing 720p). Decoded video frame size is 1920*1080*1.5
(12bpp), that's about 3mb. Default keyint size for x264-encoded video is
250 (let's forget about non-reencoded hd-dvd's utilizing h264 for now).
Which means, in the bad case, when no extra keyframes occur the whole
"gop" takes 742mb of ram. Well actually strict 250 keyframe distance is
unlikely on high bitrate, however still possible - and you probably have
to pre-allocate ram for you gop, considering worst case, because
malloc'ing 3mb 25 times a second sounds like a bad idea.

Also you need ram to store uncompressed data for next gop. Well that's
isn't a big amount, but considering hd-dvd/blu-ray bitrate (hardly
would happen together with 250 keyint, however still possible) it could
be about 60 mb, so that means 800 mb per gop.

Now thing about how many gops you would like to decode per parallel. You
weren't satisfied with 2 cores, right? So that means 3 or 4 gops.. Which
means 2.4 to 3.2 gigs of ram. Ugh.

> video should play perfectly smooth. This algorithm
> will keep decoding something while the buffers are not
> full. Oh, and the thread decoding the current GOP must
> have higher priority than any other thread.

Also, from my experience changing per-thread priority in linux doesn't
work as expected. In a lot of cases effect is either negligible or at
least not as big as expected. However, in your max-one-thread-per-cpu
case this may be not a problem (or even not required at all), but I
still wouldn't count on thread priorities.

When decoding multiple very different frames at once, you may also run
into problems with memory performance. Multi-socket Opteron and Xeon
systems have more than one memory controller, however all current
desktop systems have only one (dual-channel) memory controller. Decoding
1080p isn't only cpu-intense process, it's also memory-intense, and I
fear that you may run into memory bandwidth-related problems. Of course
raw bandwith is pretty big, cpus can have several megabytes of cache and
modern memory controllers are pretty intellectual, but considering that
you are doing reads from several very different places in memory and
trashing cpu cache (shared between several cores) when decoding multiple
frames at once.

Actually, this area is the same where i've seen how per-thread
priorities suck (when one thread is doing a lot of really memory-intense
operations it leaves almost no chances to other threads - unless they
are trying to do something as intense, they almost block, even though
free cores and even cpus are available - and tweaking priorities hardly
changes anything).

These issues are most obvious, however there will be a lot more of them
when you'll think of implementation details..

-- 

Vladimir



More information about the MPlayer-dev-eng mailing list