Does RBN give a reliable metric for comparing antennas?

I see that lots of hams depend on HF RBN to compare to antennas, or to compare before and after a change.

Experience says that A/B comparisons on HF are subject to variation in Ionospheric propagation paths, and that variation can be wide in range and rapid.

An example

Above is a plot of signal strength of an 80m A1 Morse (CW) beacon measured in 20Hz bandwidth over 15min snapshot (a terrestrial path of length 105km).

The signal strength ranges over 35dB during that period, and can change 15dB in 15s.

Note that the variation is likely to be frequency, path and time dependent.

Clearly simple spot A/B comparisons, or even averaging 10 A/B comparisons is not likely to give a meaningful comparison with an underlying propagation channel showing this type of characteristic.

Parametric statistics

Be aware that this data source is not usually normally distributed, and application of parametric statistics that assume so is questionable. The above frequency distribution is not quite bell shaped, they are often much worse and usually fail standard statistical tests for a normal distribution.

Above, this data fails the Shapiro-Wilk test for normality.

What does this mean? It means that taking means, standard deviation, differences of means etc is unsound statistics, though in this case, it is not very bad. You might think of it as weakly normally distributed in this case.

BTW, rounding measurements to 1dB (as lots of tools like RBN, WSPR etc do) degrades the result of the Shapiro-Wilk test for normality.

A few random selections of 10 continuous observations yielded very poor Shapiro-Wilk test results.

Above is a typical small sample (10) rounded to 1dB, it would be quite wrong to assume normal distribution, eg to calculate the difference in means to compare two antennas, it is junk science.