Premium
Forensic Footwear Reliability: Part II—Range of Conclusions, Accuracy, and Consensus*
Author(s) -
Richetelli Nicole,
Hammer Lesley,
Speir Jacqueline A.
Publication year - 2020
Publication title -
journal of forensic sciences
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.715
H-Index - 96
eISSN - 1556-4029
pISSN - 0022-1198
DOI - 10.1111/1556-4029.14551
Subject(s) - confidence interval , medicine , reliability (semiconductor) , statistics , reproducibility , categorical variable , mathematics , power (physics) , physics , quantum mechanics
Between February 2017 and August 2018, West Virginia University conducted a reliability study to determine expert performance among forensic footwear examiners in the United States. Throughout the study’s duration, 70 examiners each performed 12 comparisons and reported a total of 840 conclusions. In order to assess the accuracy of conclusions, the similarities and differences between mated and nonmated pairs were evaluated according to three criteria: (i) inherent agreement/disagreement in class, wear, and randomly acquired features, (ii) limitations as a function of questioned impression quality, clarity, and totality, and (iii) adherence to the Scientific Working Group for Shoeprint and Tire Tread Evidence (SWGTREAD) 2013 conclusion standard. Using these criteria, acceptable/expected categorical conclusions were defined. Preliminary results from this study are divided into a series of three summaries. This manuscript (Part II) reports accuracy and reproducibility. For mated pairs, accuracy equals 76.3% ± 13.0% (median of 78.6% and a 90% confidence interval between 72.2% and 80.0%). For nonmated pairs, accuracy equals 87.4% ± 9.24% (median of 91.4% and a 90% confidence interval between 84.7% and 89.8%). In addition, the community assessed agreement (denoted by IQR) of reported results equals the research team's accepted/expected conclusions for 10 out of 12 comparisons. In terms of reproducibility, the 90% confidence interval for consensus was computed and found to equal 0.71–0.86 (median of 0.77) for the combined dataset. Although based on a limited sample size, these results provide a baseline estimate of accuracy and consensus/reproducibility as a function of the existing seven‐point SWGTREAD 2013 conclusion standard.