z-logo
Premium
Bayesian modelling of tuberculosis clustering from DNA fingerprint data
Author(s) -
Scott Allison N.,
Joseph Lawrence,
Bélisle Patrick,
Behr Marcel A.,
Schwartzman Kevin
Publication year - 2007
Publication title -
statistics in medicine
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.996
H-Index - 183
eISSN - 1097-0258
pISSN - 0277-6715
DOI - 10.1002/sim.2899
Subject(s) - genotyping , bayesian probability , cluster analysis , categorical variable , bayes' theorem , computer science , statistics , data mining , computational biology , genetics , biology , artificial intelligence , genotype , mathematics , machine learning , gene
Abstract A combination of continuous and categorical tests, none of which is a gold standard, is often available for classification of subject status in epidemiologic studies. For example, tuberculosis (TB) molecular epidemiology uses select mycobacterial DNA sequences to provide cluesabout which cases of active TB are likely clustered, implying recent transmission between these cases, versus reactivation of previously acquired infection. The proportion of recently transmitted cases is important to public health, as different control methods are implemented as transmission rates increase. Standard typing methods include IS 6110 restriction fragment length polymorphism (IS 6110 RFLP), but recently developed polymerase chain reaction based genotyping modalities, including mycobacterial interspersed repetitive unit‐variable‐number tandem repeat and spoligotyping provide quicker results. In addition, it has recently been suggested that results from IS 6110 RFLP can be used to create a continuous measure of genetic relatedness, called the nearest genetic distance. Whichever method is used, estimation of cluster rates is rendered difficult by the lack of a gold standard method for classifying cases as clustered or not. Since many of these methods are relatively new, their properties have not been extensively investigated. Misclassification errors subsequently lead to sub‐optimal estimation of risk factors for clustering. Here we show how Bayesian latent class models can be used in such situations, for example to simultaneously analyse Mycobacterium tuberculosis DNA data from all three of the above methods. Using the data collected at the Public Health Unit in Montreal, we estimate the proportion of clustered cases and the operating characteristics of each method using information from all three methods combined, including both continuous and dichotomous measures from IS 6110 RFLP. A misclassification‐adjusted regression model provides estimates of the effects of risk factors on the clustering probabilities. We also discuss how one must carefully interpret any inferences that arise from a combination of continuous and dichotomous tests. Copyright © 2007 John Wiley & Sons, Ltd.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here