[Ffmpeg-devel] Parallelizing the h.264 Decoder

Tue Nov 14 02:30:25 CET 2006

On Tue, 14 Nov 2006, Michael Niedermayer wrote:
> On Mon, Nov 13, 2006 at 01:20:50PM -0700, Loren Merritt wrote:
>> On Mon, 13 Nov 2006, Philip Peter wrote:
> [...]
>>> With the current sources would it be possible to first decode the cabac
>>> for all macroblocks (preferably split over a few cores) and then do the
>>> rest of the decoding process?
>>
>> The simple way to do this with the current sources would be to duplicate
>> the dct, mvd, and other arrays which currently only store 1 macroblock at
>> a time. Which would take about 3 times as much memory as a frame itself.
>> The only way I can see to reduce that memory is to synchronize the threads
>> more often so you don't have to store as many macroblocks.
>
> hmm, what if the dct coefficients where stored in "bitstream" order in
> run, level style, that way the amount of memory needed would depend on the
> frame complexity and probably be significantly less, or do i miss something
> why this isnt possible?

video: 720x400 1019 kbps
frames: 33959
nnz coefs: 218103809
nnz blocks: 84302362

assuming 3 bytes per coef for run+level, and 1 byte per block for EOB, 
that's an average of 22kB per frame instead of 864kB for the raw coefs.

--Loren Merritt