
The tests for sampling differences and contingency
Author(s) -
Harold Jeffreys
Publication year - 1937
Publication title -
proceedings of the royal society of london. series a, mathematical and physical sciences
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.814
H-Index - 135
eISSN - 2053-9169
pISSN - 0080-4630
DOI - 10.1098/rspa.1937.0196
Subject(s) - mathematics , contingency table , surprise , sampling (signal processing) , class (philosophy) , combinatorics , proposition , statistics , contingency , sample (material) , computer science , social psychology , psychology , physics , philosophy , artificial intelligence , epistemology , filter (signal processing) , computer vision , thermodynamics
1—The question discussed in these tests (Jeffreys 1935, 1936a ) has the form: we have a sample or a pair of samples, in which the numbers of individuals with the pairs of propertiesϕ. ψ, ϕ. ~ψ, ~ϕ. ψ, ~ϕ. ~ψ are respectivelyx, y, x', y' : do the numbers afford evidence for or against the proposition thatϕ andψ are associated? The conditions contemplated in the experiments, however, are somewhat different. In what I have called the sampling problem (1935, p. 203), large classes ofϕ' s and~ϕ' s are already supposed separated, and we select from them at random arbitrarily assigned numbers of membersx +y ,x' +y' . In the contingency problem (1936a , pp. 426-29), theϕ' s and~ϕ' s are not already separated. We have only a single large class containing all four types, and in the sampling we choose arbitrarily a total numberx +y +x' +y' . it would therefore be expected that the significance tests found in the two cases would differ slightly. In the sampling problem, the sampling errors ofx andy are constrained to be equal and opposite, which is not so in the contingency problem. The tests actually found were different, and no surprise was felt; it is only recently that I have noticed that the difference is liable to be so large that in some cases, with the samex, y, x', y' , an association could not be asserted in the sampling one. Again, ifx' andy are very different, interchanging them makes a great difference to the result, so that in the sampling problem it apparently makes a large difference whether we sample first with regard toϕ orψ if one of them is a rare property. It turns out that the discrepancy is removed when we take into account an inequality that was overlooked. We denote the proposition that there is no association byq , that there is one by~q .K is the ratio expressing the support forq given by the dataθ , namelyK =P (q |θh )/P (~q |θh )/P (q |h )/P(~q |h ).