[FFmpeg-devel] transcoding on nvidia tesla

Sun Feb 10 23:25:04 CET 2008

Hello,
On Sun, Feb 10, 2008 at 11:12:23PM +0100, christophelorenz wrote:
> Having done some gpu dev, I can tell that there's some good and some 
> very bad things to do...
> 
> Easy ones, -huge- performance increase :
> Rescaling with various algos, color space conversions, basic deblocking, 
> denoise ...
> 
> More tricky, probably faster by factor of 10 but with quite some 
> optimisation and dev time :
> (i)Motion compensation, (i)dct, wavelets ...
> 
> Useless, same speed or 10x slower : (because conditionnal branching 
> cannot be avoided)
> Byte stream parsing, sorting...
> 
> Total lost of time and 100x slower on gpu : (gpu probably has to 
> emulates all the required bit functions and data impose a serial 
> operation so no parallelisation is possible)
> Bit stream parsing....

Quite what I figured with only theory and some FX5200-level GPU
programming.

[...]
> CUDA has a much better memory transfer performance than DirectX / 
> OpenGL, examples show 3Gbytes/sec (up and down) but it vastly depends on 
> motherboard used.
> Anyhow, it is still a memory copy. If you need to do this often it will 
> ruin performance.

Hmm... I though when using things like PixelBuffers the mapped memory
can (and if you are lucky will) be graphics memory (or at the very least
directly DMA-capable), so no additional memcpy would be necessary if you
write/read directly into/from that.
There is still some additional latency though.
And admittedly I never got it to work with anything besides RGB32 data...

Greetings,
Reimar D?ffinger