[FFmpeg-devel] [PATCHv2] lavc/cbrt_tablegen: speed up tablegen

Daniel Serpell dserpell at gmail.com
Tue Jan 5 16:44:06 CET 2016


Hi!,

El Mon, Jan 04, 2016 at 06:33:59PM -0800, Ganesh Ajjanagadde escribio:
> This exploits an approach based on the sieve of Eratosthenes, a popular
> method for generating prime numbers.
> 
> Tables are identical to previous ones.
> 
> Tested with FATE with/without --enable-hardcoded-tables.
> 
> Sample benchmark (Haswell, GNU/Linux+gcc):
> prev:
> 7860100 decicycles in cbrt_tableinit,       1 runs,      0 skips
> 7777490 decicycles in cbrt_tableinit,       2 runs,      0 skips
> [...]
> 7582339 decicycles in cbrt_tableinit,     256 runs,      0 skips
> 7563556 decicycles in cbrt_tableinit,     512 runs,      0 skips
> 
> new:
> 2099480 decicycles in cbrt_tableinit,       1 runs,      0 skips
> 2044470 decicycles in cbrt_tableinit,       2 runs,      0 skips
> [...]
> 1796544 decicycles in cbrt_tableinit,     256 runs,      0 skips
> 1791631 decicycles in cbrt_tableinit,     512 runs,      0 skips
>

See attached code, function "test1", based on an approximation of:

  (i+1)^(1/3) ~= i^(1/3) * ( 1 + 1/(3i) - 1/(9i) + 5/(81i) - .... )

Generated values are the same as original floats (max error in double
is < 4*10^-10), it is faster (and I think, simpler) than your version.

Perhaps altering the constants it could be made faster still, but it is
currently dominated by de division in the main loop.

    Daniel.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cubert.c
Type: text/x-csrc
Size: 2320 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20160105/d4215b82/attachment.c>


More information about the ffmpeg-devel mailing list