[FFmpeg-devel] [PATCH] MMX implementation of VC-1 inverse transforms

Mon Jan 14 21:33:59 CET 2008

On Mon, 14 Jan 2008, Ivan Kalvachev wrote:

> - Why you choose to transpose at all. Just to save time and effort?
> It is usual to have separate version of SIMD depending if they work on
> row or columns. The row and column stages are different and you pass
> the differences as parameters.

Who says it's usual? A transposed scantable and a column/transpose/column 
transform is faster than a row/column?transform for iDCT and iHCT, I have 
no reason to doubt that applies to VC1's transform as well.

The only benefit of row/column is that pmaddwd adds a little bit of 
precision compared to a pure 16bit column transform. But that applies only 
to an integer approximation of a real DCT, not if the standard has 
already made the 16bit approximation.

--Loren Merritt