One can argue that one of the main roles of the subject of statistics is to characterize what the evidence in the collected data says about questions of scientific interest. There are two broad questions that we will refer to as the estimation question and the hypothesis assessment question. For estimation, the evidence in the data should determine a particular value of an object of interest together with a measure of the accuracy of the estimate, while for the hypothesis assessment, the evidence in the data should provide evidence in favor of or against some hypothesized value of the object of interest together with a measure of the strength of the evidence. This will be referred to as the evidential approach to statistical reasoning, which can be contrasted with the behavioristic or decision-theoretic approach where the notion of loss is introduced, and the goal is to minimize expected losses. While the two approaches often lead to similar outcomes, this is not always the case, and it is commonly argued that the evidential approach is more suited to scientific applications. This paper traces the history of the evidential approach and summarizes current developments.
Most statistical analyses refer to the concept of statistical evidence as in phrases like “the evidence in the data suggests” or “based on the evidence we conclude”, etc. It has long been recognized, however, that the concept itself has never been satisfactorily defined or, at least, no definition has been offered that has met with general approval. This article is about the concept of statistical evidence outlining some of the key historical aspects of the discussion and the current state of affairs.
One issue that needs to be dealt with right away is whether or not it is even necessary to settle on a clear definition. After all, statistics has been functioning as an intellectual discipline for many years without a resolution. There are at least two reasons why resolving this is important.
First, to be vague about the concept leaves open the possibility of misinterpretation and ambiguity as in “if we don’t know what statistical evidence is, how can we make any claims about what the evidence is saying?” This leads to a degree of adhocracy in the subject where different analysts measure and interpret the concept in different ways. For example, consider the replicability of research findings in many fields where statistical methodology is employed. One cannot claim that being more precise about statistical evidence will fix such problems, but it is reasonable to suppose that establishing a sound system of statistical reasoning, based on a clear prescription of what we mean by this concept, can only help.
Second, the subject of statistics cannot claim that it speaks with one voice on what constitutes a correct statistical analysis. For example, there is the Bayesian versus frequentist divide as well as the split between the evidential versus the decision-theoretic or behavioristic approaches to determining inferences. This diversity of opinion, while interesting in and of itself, does not enhance general confidence in the soundness of the statistical reasoning process.
As such, it is reasonable to argue that settling the issue of what statistical evidence is and how it is to be used to establish inferences is of paramount importance. Statistical reasoning is a substantial aspect of how many disciplines, from anthropology to particle and quantum physics and on to zoology, determine truth in their respective subjects. One can claim that the role of the subject of statistics is to provide these scientists with a system of reasoning that is logical, consistent and sound in the sense that it produces satisfactory results in practical contexts free of paradoxes. Recent controversies over the use of
p-values, which have arisen in a number of scientific contexts, suggest that, at the very least, this need is not being met, for example, see
[1][2][1,2].
It is to be emphasized that this paper is about the evidential approach to inference and does not discuss decision theory. In particular, the concern is with the attempts, within the context of evidential inference, to characterize statistical evidence. In some attempts to accomplish this, concepts from decision theory have been used, and so there is a degree of confounding between the two approaches. For example, it is common when using
p-values to invoke an error probability, namely, the size of a test as given by
𝛼∈(0,1)α∈(0,1), to determine when there is evidence against a hypothesis; see
Section 3.1. Furthermore, the discussion of which approach leads to more satisfactory outcomes is not part of our aim, and no such comparisons are made. There is indeed a degree of tension between evidential inference and decision theory, and this dates back to debates between Fisher and Neyman concerning which is more appropriate. Some background on this can be found in the references
[3][4][5][3,4,5]. A modern treatment of decision theory can be found in
[6].
In
Section 2, we provide some background.
Section 3 discusses various attempts at measuring statistical evidence and why these are not satisfactory.
Section 4 discusses what we call the principle of evidence and how it can resolve many of the difficulties associated with defining and using the concept of statistical evidence.