# [MPlayer-dev-eng] Re: Compile options

Trent Piepho xyzzy at speakeasy.org
Thu Sep 21 07:56:53 CEST 2006

On Thu, 21 Sep 2006, Andrew Savchenko wrote:
> On 21 Sep 2006, 03:29 Trent Piepho wrote:
> > On Wed, 20 Sep 2006, Andrew Savchenko wrote:
> > > I've made some tests for 3 movie types: H264 HD
> > > (http://images.apple.com/movies/wb/the_fountain/the_fountain-tsr_h4
> > >80p.mov), x264 and mpeg4; and for 4 sets of compile options: general
> > > -O4 option,
> >
> > Is there a sample to download for the mpeg4 test?
>
> I can upload a file which I use for mpeg4 testing, it is about 68 Mb and
> 3 minutes length. But only developers have read access on mplayerhq's
> ftp, so I don't know what to do.

Maybe there is a good sample mpeg4 file that can be downloaded from
somewhere?  To be sure, there is plenty of mpeg4 content available via
bittorrent, but someting freely available would be better.

> > I'm not sure I follow the syntax here.  Does "\pm 0.009" mean that
> > the sample standard deviation is .009 seconds, or the sample
> > variance?  Or is it the variance/standard deviation of the sample
> > mean?
>
> It is variance/standart deviation of the sample mean.

Which is it?  Variance of the sample mean or standard deviation of the
sample mean?  It looks like from the sqrt in your formula you are using
standard deviation.

> > variance. You need to also know the number of samples to know how
> > many degrees of freedom there are in the t distribution.
>
> Yes, of course. I used 10 samples: each file was tested 10 times, then I
> calculate error using t-distribution (also known as Student's
> coefficients) for reliability 0.95: t(0.95,10)=2.26.

Since you are comparing the difference of the means, the degrees of freedom
would not be 10.  If you want to assume that the variance of the two
populations is the same, then the degrees of freedom would be 18.  If don't
assume that the variance of the two population is the same, then you would
want to use the Welch-Satterthwaite approximation to get the degrees of
http://www.itl.nist.gov/div898/handbook/prc/section3/prc31.htm

Of course if you install R, and there are packages for most linux
distributions, then you can do all this with just one command.

> > If you are trying to compare multiple groups at once, then you might
> > want to try using ANOVA instead of multiple t tests.  In that case,
> > you also need to know the number of samples in each group.
>
> But why ANOVA is better than multiple t-tests? Imho t-tests are enough
> for reliable result.

It has been a long time since I learned all this, so it's hard to describe
it well.  If you do a series of pair-wise t tests, then your chance of
finding a difference between any pair is greater than it should be, because
you are making many tests.

For example, I created 10 groups of 100 samples each from the unit normal
distribution.  They should all have mean = 0 but of course random samples
will not be exactly correct.

Sample 1 mean = -0.160903922 s.d. of mean = 0.008731491
Sample 2 mean = 0.02336369 s.d. of mean = 0.01061570
Sample 3 mean = 0.027597939 s.d. of mean = 0.009059666
Sample 4 mean = 0.005441310 s.d. of mean = 0.009438608
Sample 5 mean = 0.038062735 s.d. of mean = 0.009868223
Sample 6 mean = 0.096434492 s.d. of mean = 0.009169973
Sample 7 mean = 0.08606997 s.d. of mean = 0.01130869
Sample 8 mean = -0.037867002  s.d. of mean = 0.009346874
Sample 9 mean = 0.05774512 s.d. of mean = 0.01051323
Sample 10 mean = -0.15319057 s.d. of mean = 0.01004807

You can see a boxplot here,
http://www.speakeasy.org/~xyzzy/pictures/10norm.png All the groups look
very similar, and they should, since they are all samples from the unit
normal distribution.

If one uses ANOVA to compare all the means for equality, the p value for
the F statistic is .5752, which is more than enough to not reject the null
hypothesis.  ANOVA gives us the right answer, all the means are the same.

But if you do all 45 pairwise t tests, you find that for group 1 and group
6, the p value is 0.04346.  So you would conclude that group 1 and group 6
have different means.  Doing pairwise t tests gives us the wrong answer,
they are not all the same.