[FFmpeg-devel] hardware aided video decoding

Michael Niedermayer michaelni
Fri Jul 6 16:00:58 CEST 2007


On Fri, Jul 06, 2007 at 02:13:54PM +0200, Attila Kinali wrote:
> On the other hand, i would like to keep it as generic
> as possible so that most 8x8 DCT based codecs could be
> accelerated, which would mean that the software player
> would have to take the bitstream apart, convert the
> data into something the card can digest and feed it.
> > now if we look at just mpeg1/2/4 and the case that you dont want
> > to implement the whole decoder on the card ...
> > then the most obvious things to do are:
> > 
> > do the RLE + zigzag/alt scan decoding of coeffs and the IDCT on the card
> > if you do just the IDCT on the card then you have to transfer 3+ times
> > the data from the cpu to the card as IDCT coeffs are 16bit and there
> > are as many as pixels, if you do the RLE & zigzag stuff on the card too
> > then there would be significantly less data be transmitted as 95% or
> > so of the coeffs are 0 and as the coeffs are stored as vlc coded 
> > zero run + sign + level + last_bit in the bitstream
> Why would you start at RLE and zigzag? 

hmm, no special reason, i had to start somewhere :)

you can do MC on the card and let the cpu do the idct, note there are 2
cases here
1. intra (fits in 8bit)      (simply overwrite whats in the buffer)
2. inter (needs signed 16bit)(add to current data and clip)

this would double the amount of data you have to transfer to the card
though 99.9% of the inter blocks of a normal video should fit in signed 8bit
so its likely worth to let the cpu check for that and pack the data to 8bit
if it fits

> > now h.264 does not contain anything shareable with mpeg1/2/4
> > both idct and MC is different
> How much different are they? Can it be abstracted enough
> so that a common iDCT and MC could be used for both?

mpeg1/2/4 MC block size is 
16x8 interlaced

h264 MC block sizes are:
2x2 2x4 4x2 (chroma only)
4x4 4x8 8x4 8x8
8x16 16x8 16x16 (luma only)

the interpolation filters in mpeg4 (excluding qpel/gmc) are simple
1/2 pel bilinear interpolation
case 0 x[i][j]
case 1 (x[i][j] + x[i][j+1] + r)>>1 (r is 0 or 1 depending on frame header)
case 2 (x[i][j] + x[i+1][j] + r)>>1 (r is 0 or 1 depending on frame header)
case 3 (x[i][j] + x[i+1][j] + x[i][j+1] + x[i+1][j+1] + r)>>2 (r is 1 or 2 depending on frame header)

for h.264 you apply some 6 tap (-1 3 -6 20 20 -6 3 -1) filter vertically and
horizontally to get the 1/2 pel luma positions then round to 8bit and then
linearly interpolate between them to find the 1/4 pel positions
for chroma you do 1/8 pel bilinear interpolation
(i suggest you read the h.264 spec, which explains that more completely)

the idct in mpeg1/2/4 is not specified precissely and anything which passes
the ISO and IEEE accuracy tests is compliant though in practice you will
get nightmares if the idct is not accurate enough even if it passes all
tests ...

h.264 specifies a idct approximation precissely and you must implement 
it bit accurate, also the h.264 thing is likely very far from passing the
tests for the mpeg1/2/4 idct
then the h.264 idct comes in 4x4 and 8x8 types at least, the
4x4 can be applied recursively to get a 16x16 intra block, ...

> > also for h.264 doing just IDCT is likely not going to work, that is
> > having intra prediction done on the cpu which needs to read from the
> > previous 4x4 IDCT result is just going to be a nightmare
> Do i understand you correctly, that the IDCT results depend on
> the results of the previous block? 
> (ok, i have to read some h.264 docu)

no the idct results themselfs do not but the idct is added to the
predicted block and that can depend on the previous blocks through intra
prediction or the previous frame through motion compensation

last frame(s)
 |           |           |
 v           v           v
 ^           ^           ^
 |           |           |
idct        idct        idct
 ^           ^           ^
 |           |           |

(note each block uses either MC or intra prediction not both, this is
unclear in my crappy diagram ...)

> > > Yes, i had a look at the few hardware h.264 decoders around,
> > > but those seem all to be build around a CPU or DSP core with
> > > a few additional special instructions needed for decoding.
> > 
> > yes, put a CPU and DSP on the card that should do too :)
> > and dont forget adding special instructions for CABAC and MC
> Yes, it should do but 1) it's very expensive and 2) uses a lot
> of power.

it also could be used for other things -> would simlify the hw
and expensive, cant you implement a CPU on the FPGA :)

Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Good people do not need laws to tell them to act responsibly, while bad
people will find a way around the laws. -- Plato
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20070706/5ef3219e/attachment.pgp>

More information about the ffmpeg-devel mailing list