[FFmpeg-devel] Benchmarking

Tue Dec 16 00:24:02 CET 2008

Hi

Ive run some tests to determine how reliable benchmarks are and which
method is most reliable. The results are not really a surprise as they
match what ive observerd subjectively in the past years during
optimization work.
What i tested, is a simple loop that executes for "len" iterations vs.
"len*1.001" iterations, that is the test is only concerned with the
detection of a 0.1% speed difference. The methods tested are
the minimum, the average (that is the arithmetic mean), the average without
the largest, the average without the largest and without the smallest and
the average without the 2 largest
the number of benchmark samples tried where 3,5 and 15. Each of these tests
was run with a wide range of len values and each 100 times, a score of 100
thus means perfect detection of the faster case while ~50 would mean as good
as random.
The system this was run on had X, webbrowsers and some kde applications
running at the same time though they where not used during the time and the
system was otherwise idle.
The code was compiled with -O1 and gcc 4.3 the system was a
Pentium Dual 1.73Ghz notebook

3 samples:
len:       1250 correct: avg:  94 min:  94 without outliers:  99 without max:  94 without 2max:  94 
len:       2500 correct: avg:  99 min: 100 without outliers: 100 without max: 100 without 2max: 100 
len:       5000 correct: avg:  97 min: 100 without outliers: 100 without max: 100 without 2max: 100 
len:      10000 correct: avg:  98 min: 100 without outliers: 100 without max: 100 without 2max: 100 
len:      20000 correct: avg:  96 min: 100 without outliers: 100 without max: 100 without 2max: 100 
len:      40000 correct: avg:  91 min: 100 without outliers: 100 without max: 100 without 2max: 100 
len:      80000 correct: avg:  75 min: 100 without outliers: 100 without max: 100 without 2max: 100 
len:     160000 correct: avg:  64 min: 100 without outliers:  97 without max:  97 without 2max: 100 
len:     320000 correct: avg:  56 min: 100 without outliers:  82 without max:  82 without 2max: 100 
len:     640000 correct: avg:  74 min:  95 without outliers:  55 without max:  75 without 2max:  95 
len:    1280000 correct: avg:  78 min:  61 without outliers:  78 without max:  69 without 2max:  61 
len:    2560000 correct: avg:  59 min:  77 without outliers:  61 without max:  66 without 2max:  77 
len:    5120000 correct: avg:  96 min: 100 without outliers: 100 without max: 100 without 2max: 100 
len:   10240000 correct: avg:  97 min: 100 without outliers: 100 without max: 100 without 2max: 100 
len:   20480000 correct: avg:  83 min: 100 without outliers:  97 without max:  99 without 2max: 100 
len:   40960000 correct: avg:  86 min: 100 without outliers:  98 without max: 100 without 2max: 100 
len:   81920000 correct: avg:  68 min: 100 without outliers: 100 without max: 100 without 2max: 100 
len:  163840000 correct: avg:  55 min: 100 without outliers: 100 without max: 100 without 2max: 100 
len:  327680000 correct: avg:  86 min: 100 without outliers:  74 without max:  74 without 2max: 100 
len:  655360000 correct: avg:  99 min:  53 without outliers:  79 without max:  79 without 2max:  53 
len: 1310720000 correct: avg:  99 min: 100 without outliers: 100 without max: 100 without 2max: 100 
len: 2621440000 correct: avg: 100 min: 100 without outliers: 100 without max: 100 without 2max: 100 

5 samples:
len:       1250 correct: avg:  91 min:  97 without outliers:  91 without max:  91 without 2max:  91 
len:       2500 correct: avg:  99 min: 100 without outliers: 100 without max: 100 without 2max: 100 
len:       5000 correct: avg:  98 min: 100 without outliers: 100 without max: 100 without 2max: 100 
len:      10000 correct: avg:  96 min: 100 without outliers: 100 without max: 100 without 2max: 100 
len:      20000 correct: avg:  94 min: 100 without outliers: 100 without max: 100 without 2max: 100 
len:      40000 correct: avg:  84 min: 100 without outliers: 100 without max: 100 without 2max: 100 
len:      80000 correct: avg:  67 min: 100 without outliers:  96 without max:  96 without 2max: 100 
len:     160000 correct: avg:  53 min: 100 without outliers:  88 without max:  88 without 2max: 100 
len:     320000 correct: avg:  76 min: 100 without outliers:  87 without max:  87 without 2max:  99 
len:     640000 correct: avg:  70 min: 100 without outliers:  67 without max:  71 without 2max:  76 
len:    1280000 correct: avg:  90 min:  61 without outliers:  91 without max:  78 without 2max:  62 
len:    2560000 correct: avg:  82 min:  83 without outliers:  80 without max:  83 without 2max:  83 
len:    5120000 correct: avg:  93 min: 100 without outliers:  98 without max:  98 without 2max: 100 
len:   10240000 correct: avg:  93 min: 100 without outliers: 100 without max: 100 without 2max: 100 
len:   20480000 correct: avg:  87 min: 100 without outliers: 100 without max: 100 without 2max: 100 
len:   40960000 correct: avg:  75 min: 100 without outliers: 100 without max: 100 without 2max: 100 
len:   81920000 correct: avg:  52 min: 100 without outliers: 100 without max: 100 without 2max: 100 
len:  163840000 correct: avg:  84 min: 100 without outliers:  88 without max:  88 without 2max: 100 
len:  327680000 correct: avg:  81 min: 100 without outliers:  83 without max:  83 without 2max: 100 
len:  655360000 correct: avg:  89 min:  61 without outliers:  90 without max:  85 without 2max:  77 
len: 1310720000 correct: avg:  98 min: 100 without outliers: 100 without max: 100 without 2max: 100

15 samples:
len:       1250 correct: avg:  96 min: 100 min2: 100 without outliers: 100 without max: 100 without 2max: 100 
len:       2500 correct: avg:  95 min: 100 min2: 100 without outliers: 100 without max: 100 without 2max: 100 
len:       5000 correct: avg:  96 min: 100 min2: 100 without outliers: 100 without max: 100 without 2max: 100 
len:      10000 correct: avg:  85 min: 100 min2: 100 without outliers: 100 without max: 100 without 2max: 100 
len:      20000 correct: avg:  63 min: 100 min2: 100 without outliers:  98 without max:  98 without 2max: 100 
len:      40000 correct: avg:  59 min: 100 min2: 100 without outliers:  98 without max:  98 without 2max: 100 
len:      80000 correct: avg:  60 min: 100 min2: 100 without outliers:  91 without max:  91 without 2max: 100 
len:     160000 correct: avg:  69 min: 100 min2: 100 without outliers:  73 without max:  73 without 2max: 100 
len:     320000 correct: avg:  75 min: 100 min2: 100 without outliers:  82 without max:  83 without 2max:  83 
len:     640000 correct: avg:  86 min: 100 min2: 100 without outliers:  87 without max:  93 without 2max:  93 
len:    1280000 correct: avg:  87 min:  93 min2:  80 without outliers:  98 without max:  98 without 2max:  97 
len:    2560000 correct: avg:  83 min:  97 min2:  95 without outliers:  90 without max:  94 without 2max:  98 
len:    5120000 correct: avg:  91 min: 100 min2: 100 without outliers:  97 without max:  97 without 2max: 100 
len:   10240000 correct: avg:  81 min: 100 min2: 100 without outliers: 100 without max: 100 without 2max: 100 
len:   20480000 correct: avg:  64 min: 100 min2: 100 without outliers: 100 without max: 100 without 2max: 100 
len:   40960000 correct: avg:  77 min: 100 min2: 100 without outliers:  99 without max:  99 without 2max: 100 
len:   81920000 correct: avg:  95 min: 100 min2: 100 without outliers:  95 without max:  97 without 2max: 100 
len:  163840000 correct: avg:  97 min: 100 min2: 100 without outliers:  95 without max:  96 without 2max: 100 
len:  327680000 correct: avg:  99 min: 100 min2: 100 without outliers:  99 without max: 100 without 2max:  99 
(please forgive the slightly changed version, but i did not want to
 rerun the many hour test because of this)

In conclusion, we can see that the minimum is the most reliable way and
the average is the least reliable way to detect the faster.
We can also see that 3 samples are enough for almost all cases

The source code is attached

-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

He who knows, does not speak. He who speaks, does not know. -- Lao Tsu
-------------- next part --------------
A non-text attachment was scrubbed...
Name: benchcmp.c
Type: text/x-csrc
Size: 2775 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20081216/c3198e71/attachment.c>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20081216/c3198e71/attachment.pgp>