Statistical Analysis

Lingling Yang
3 min readJul 1, 2020

1. Statistical Hypothesis Test

2.1 Parametric

2.2.1 Welch’s t-test

Welch’s t-test or unequal variances t-test is a two-sample statistical hypothesis test which is used to test the hypothesis that two populations have equal means. It is an adaptation of Student t-test and is more reliable when the two samples have unequal variances and/or unequal sample sizes.

Assumption: normality.

2.2 Non-Parametric

2.2.1 Wilcoxon signed-rank Test ()

The Wilcoxon signed rank test is a nonparametric test for two populations when the observations are paired. In this case, the test statistic, W, is the sum of the ranks of positive differences between the observations in the two samples (i.e. x -y). When used for one sample, then W is the sum of the ranks of positive differences between the observations and the hypothesized median value M.

## matlab# a pair
signrank(x, y)
# one sample
signrank(x)
signrank(x, M)

Assumption: (a) Data area paired. (b) Each pair is chosen randomly and independently.

2.2.2 Wilcoxon rank sum test

The Wilcoxon rank sum test (also called the Mann-Whitney U test,, Mann-Whitney-Wilcoxon(MWW) or Wilcoxon-Mann-Whitney test) is a nonparametric test for two populations when samples are independent.

(from Matlab manual) Can be used for two samples with different sample sizes. In this case, the test statistic which ranksum returns is the rank sum of the first samples.

# matlab
ranksum(x, y)

2. Check of Normality and Variances

2.1 Shapiro-Wilk test

The Shapiro-Wilk test is a test of normality in frequentist statistics.

2.2 Kolmogorov–Smirnov test

In statistics, the Kolmogorov–Smirnov test (K–S test or KS test) is a nonparametric test of the equality of continuous (or discontinuous), one-dimensional probability distributions that can be used to compare a sample with a reference probability distribution (one-sample K–S test), or to compare two samples (two-sample K–S test). (https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test)

2.2 Levene test

Levene’s test is used to check that variances are equal for all the samples.

3. Multiple comparisons

3.1 Bonferroni correction

the critical value (alpha) for an individual test = dividing the familywise error rate (usually 0.05) by the number of tests.

The Bonferroni correction is appropriate when a single false positive in a set of tests would be a problem. It is mainly useful when there are a fairly small number of multiple comparisons and you’re looking for one or two that might be significant.

However, if you have a large number of multiple comparisons and you’re looking for many that might be significant, the Bonferroni correction may lead to a very high rate of false negatives.

3.2 Controlling the false discovery rate: Benjamini–Hochberg procedure

One good technique for controlling the false discovery rate was briefly mentioned by Simes (1986) and developed in detail by Benjamini and Hochberg (1995). Put the individual P values in order, from smallest to largest. The smallest P value has a rank of i=1, then next smallest has i=2, etc. Compare each individual P value to its Benjamini-Hochberg critical value, (i/m)Q, where i is the rank, m is the total number of tests, and Q is the false discovery rate you choose. The largest P value that has P<(i/m)Q is significant, and all of the P values smaller than it are also significant, even the ones that aren’t less than their Benjamini-Hochberg critical value.

--

--