[MPlayer-dev-eng] Re: Compile options
Trent Piepho
xyzzy at speakeasy.org
Mon Sep 25 16:07:14 CEST 2006
On Sat, 23 Sep 2006, Diego Biurrun wrote:
> On Sat, Sep 23, 2006 at 05:50:01PM +0300, Uoti Urpala wrote:
> > On Sat, 2006-09-23 at 07:17 -0700, Trent Piepho wrote:
> > > Mean and sample standard deviation for each optimization level, 10 samples per test.
> >
> > I think minimum would be a more appropriate value than mean, as the
> > decoding should be mostly deterministic and larger times represent
> > interference from other processes.
>
> I was about to say something similar.
>
> What's wrong with taking - say - the best of x runs? The process is
> supposed to be deterministic ...
If that was a good idea, there would be popular statistical tests based on
it. However, if there is any merit to the concept of comparing extrema, it
is not something I was taught. Quite the opposite really; how to keep
extrema from producing a result that isn't justified.
As an example of why best of x runs is bad, consider the actual data from
the mpeg4 test of -O4 with -fno-inline-functions from dsputil_mmx.c vs
-fno-inline-functions for everything.
Using established statistical techniques, I concluded that there is no
significant difference that can be detected with the available data.
If we look at the first 3 data points:
O4.noif_dsp 12.814 12.750 12.703
O4.noif_all 12.821 12.786 12.764
It looks like noif_dsp is the best, with a minimum of 12.703 vs 12.764. Of
course, we have no idea what the confidence level is for that value.
That's one of the reason why "best of x" isn't a valid method. It just
doesn't have the mathematical basis that real methods have. The p-value and
confidence interval from Student's t test isn't just something William
Gosset made up, they are mathematical truths like the value of pi.
Anyway, we've already decided that noif_dsp is the best, but why not do
some more runs?
O4.noif_dsp 12.755 12.661 12.764
O4.noif_all 12.768 12.633 12.635
Look at that, now noif_all has the best run with 12.633! Before we
concluded noif_dsp was the best, now we conclude the opposite. How about
looking at the rest of the 10 runs?
O4.noif_dsp 12.631 12.628 12.685 12.796
O4.noif_all 12.782 12.633 12.785 12.798
Now noif_dsp is best again with 12.628. First we get one answer, then the
other, and then the original again! If I did another 10 runs, what would
be the best then? How does one qualify the result, that after X runs this
is best, but after X+1 maybe it will be totally different? It seems that
the conclusion from Student's t test, that there is no significant
difference, is the right one. Looking at the minimum is trying to find an
answer that isn't there.
More information about the MPlayer-dev-eng
mailing list