[Ffmpeg-devel] Native H.264 encoder (was: I'm giving up)

Michael Niedermayer michaelni
Mon Dec 11 15:29:23 CET 2006


Hi

On Mon, Dec 11, 2006 at 01:58:24PM +0100, Panagiotis Issaris wrote:
> Hi,
> 
> On Mon, 2006-12-11 at 13:24 +0100, Panagiotis Issaris wrote:
> > [...]
> > I reran the tests on a Pentium 4 CPU 3.20GHz and on that machine it
> > appears to make a consistent difference of about 200 clock cycles.
> > 
> > With the for loops:
> > ...
> > 1983 dezicycles in DCTFOR, 16775281 runs, 1935 skips3689.0kbits/s    
> > frame=  101 q=-1.0 Lsize=    1652kB time=4.0 bitrate=3350.1kbits/s    
> > video:1652kB audio:0kB global headers:0kB muxing overhead 0.000000%
> > 
> > Repeated runs gave: 1991, 1986, 1994, 1995, 1997, 2061
> > 
> > Without the for loops:
> > ...
> > 1809 dezicycles in DCT, 16776700 runs, 516 skipsate=3640.6kbits/s    
> > frame=  101 q=-1.0 Lsize=    1652kB time=4.0 bitrate=3350.1kbits/s    
> > video:1652kB audio:0kB global headers:0kB muxing overhead 0.000000%
> > 
> > Repeated runs gave: 1806, 1790, 1805, 1814, 1826, 1835
> > 
> > So, on Athlon64 it appears to make no real difference, on P4 it does.
> > I'll try and rewrite it a bit shorter using a macro.
> Patch which uses two macros to shorten the DCT implementation attached.
> 
> Any preference towards names such as TEMP|INTERMEDIATE and FINAL instead
> of PART1 and PART2?

i am fine with all of them ...


[...]

> +#define  H264_DCT_PART1(X) \
> +         a = block[0][X]+block[3][X]; \
> +         c = block[0][X]-block[3][X]; \
> +         b = block[1][X]+block[2][X]; \
> +         d = block[1][X]-block[2][X]; \
> +         pieces[0][X] = a+b; \
> +         pieces[2][X] = a-b; \
> +         pieces[1][X] = (c<<1)+d; \
> +         pieces[3][X] = c-(d<<1);
> +
> +#define  H264_DCT_PART2(X) \
> +         a = pieces[X][0]+pieces[X][3]; \
> +         c = pieces[X][0]-pieces[X][3]; \
> +         b = pieces[X][1]+pieces[X][2]; \
> +         d = pieces[X][1]-pieces[X][2]; \
> +         block[0][X] = a+b; \
> +         block[2][X] = a-b; \
> +         block[1][X] = (c<<1)+d; \
> +         block[3][X] = c-(d<<1);

actually the pieces array seems unneeded block could be used if its not
slower ...

and what about int a,b,c,d instead of DCTELEM? (benchmark ...)

and patch ok, feel free to commit


[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Let us carefully observe those good qualities wherein our enemies excel us
and endeavor to excel them, by avoiding what is faulty, and imitating what
is excellent in them. -- Plutarch




More information about the ffmpeg-devel mailing list