[FFmpeg-devel] hardware aided video decoding

Sun Jul 8 18:00:17 CEST 2007

On Sun, 8 Jul 2007, Attila Kinali wrote:
> On Fri, 6 Jul 2007, Loren Merritt wrote:
>
>> common mc:
>> The primitive operation of mc is a fir filter. Implement a 2/4/6/8-tap
>> fir filter (applying to a block of pixels) with programmable coefficients
>> and rounding modes, and allow the firs to be chained in arbitrary ways.
>> A generic fir filter could by used for wavelets too.
>> mpeg4 qpel also has some weirdness whereby it mirrors the block edges
>> before sending them into the 8-tap.
>
> I don't understand how you can abstract MC to a FIR filter.
> From my understanding of MC (which might be wrong) MC uses
> a vector pointing into the previously decoded frame to predict
> the currently processed macro block. To me, that's an operation
> that rather resambles a texture mapping than a FIR filter.

I don't see anything incompatible about those statements. The FIR is 
specifying exactly what algorithm the texture mapper uses to predict the 
samples.
When using textures in 3d rendering, you just need to return a sample 
value at a given non-integer location in the texture. The interpolation 
algorithm (bilinear, bicubic, lanczos, etc) is an implementation decision.
In video decoding the interpolation is exactly specified, and is different 
for each compression standard.

e.g. in h264 the hpel samples are the convolution of the original samples 
with the kernel (1 -5 20 20 -5 1)/32. The qpel samples are then the 
average of two hpel samples. It keeps the full 14-bit precision between 
horizontal and vertical hpel passes, but rounds to 8-bit between hpel and 
qpel.
In vc1 the hpel samples are the convolution of the original samples with 
the kernel (-1 9 9 -1)/16. The qpel samples are the convolution of the 
original samples with the kernel (-4 53 18 -3)/64 (or its reflection, 
depending on which qpel). It rounds to 8-bit between horizontal and 
vertical passes.

>> Decoding a h264 intra block in a software codec:
>> idct the residual of this block.
>> Predict the pixels of this block, using the decoded pixels of the
>> neighboring blocks (all neighbors: left, top-left, top, top-right), using
>> 1 of 22 prediction modes.
>> Add residual to prediction.
>> Use these newly decoded samples to predict the next block...
>>
>> If you want to do the prediction in hardware without the idct, that's
>> possible.
>
> This rather sounds like i would like to leave that completely
> in software, as the host cpu has better memory bandwidth and
> has less trouble to handle large and random memory accesses.

If you do h264 intra prediction in software, you must read pixels back 
from the gpu.
I was going to also say that the readback is latency sensitive and 
must start after decoding one (inter) macroblock and finish before 
decoding the next (intra) macroblock, but that can be avoided with 
sufficient reorganization of the codec.

--Loren Merritt