Continuing from WSPR for A/B tests – a discussion – part 2.
Other tests for normality
Above is a frequency histogram of the experiment log.
I used the Shapiro-Wilks test for normality earlier, it is one of many, and they each have strengths and weakness, or sensitivities to some types of non-normality if you like.
Chi-squared test for normality
We could shop for a normality test that is less bothered by the rounded data. Pearson’s Chi-squared test is an obvious choice as it compares the frequency histogram on chosen classes with the expected distribution if the data was normal. So if we cleverly make the classes 1dB, we might have a test that is not sensitive to the rounded data.
Above is a plot of the count for each 1dB bin against the expected count if the data was normally distributed.
It is not visible on this chart, but there is one observation at -10dB, just one. But that one observation causes the Chi-square test to reject the hypothesis that this is normally distributed data. One observation in 4508, the Chi-square test is tolerant of rounding (where applied in a complementary way), but it is very sensitive to outliers.
Excising outliers
Occasionally experimental data is likely to contain outliers, data points that are distant from the rest of the data points.
If it can be shown that they are erroneous, then discarding them is ethical.
Discarding them for convenience is of doubtful ethic.
The dataset above has just one outlier that swings a Chi-squared normality test from very weak to very strong, but other more robust normality tests choke on the rounded data. So by shopping for a more friendly test, and excising outliers, it could be argued that this data is normal, and the strong parametric conclusions given in the last article held to apply.
The experiment was one of four at 0, -3, 3, and 6dB differences, and none of the other experiments can reasonably be argued to be normally distributed.
Continued at WSPR for A/B tests – a discussion – part 4.