Erratum (10/17/13): The paper was published in Scientific Reports, an OA journal from the publishers of Nature, and not in the Journal Nature as originally reported.
Clarification (10/17/13): The paper discussed here is quite good overall and very interesting. I do not believe that anything in this post calls into question any of its main findings. This post is more of a an exercise in pedantry about p-values and NHST.
Follow up (10/20/13): See my follow up post.
In the latest issue of
(the Journal Nature) Scientific Reports a paper titled Abrupt rise of new machine ecology beyond human response time by Neil Johnson and colleagues argues that society’s techno-social systems are creating a new behavioural regime that is qualitatively different from any that have come before. Their analysis of subsecond extreme events in the financial markets is quite fascinating and highlights the need for more research focus to be placed on the ecology of systems made up of competing machines as these become more prevalent parts of the real economy.
The authors demonstrate that subsecond events follow a distinct pattern from that observed at the > 1 second scale. In making this case, however, they commit a common misinterpretation of p-values that I have written about here and here. Specifically, they interpret high p-values as strong evidence for the null hypothesis. When conducting a goodness of fit test to a power-law model of what they call Ultra-fast Extreme Events (UEEs), they state that “… p = 0.91 and hence there is strong support for a power-law distribution…”. The claim is repeated in figure caption:
Figure 4: From:
Abrupt rise of new machine ecology beyond human response time
Neil Johnson, Guannan Zhao, Eric Hunsader, Hong Qi, Nicholas Johnson, Jing Meng & Brian Tivnan
Scientific Reports 3, Article number: 2627 doi:10.1038/srep02627
Since the p-value is the probability of observing data at least as far from the null (in this case: a power-law) given that the null is true (there really is a power-law relation), it cannot simultaneously be a statement about the probability that the null is true given the observation. In other words P(A|B) ≠ P(B|A). Just because the data is not unlikely under the assumption of the null hypothesis doesn’t mean that we have evidence that the null is true.
The problem here is that finding evidence for the null hypothesis is not the same as failing to find evidence against it. The authors should instead have performed a power analysis. ‘Power’ here has no relationship to power-laws, but rather refers to the statistical concept of power. That is, the ability to correctly reject the null hypothesis. If we knew that the goodness of fit test in this scenario had a high statistical power, only then ought we to interpret a high p-value as evidence that the real relationship is not very different from a power-law.
This example shows once again that the p-value – that noble workhorse of modern science – continues to be misinterpreted in even the top tiers of the scientific literature.