Premium
Genome Scanning Tests for Comparing Amino Acid Sequences Between Groups
Author(s) -
Gilbert Peter B.,
Wu Chunyuan,
Jobes David V.
Publication year - 2008
Publication title -
biometrics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 2.298
H-Index - 130
eISSN - 1541-0420
pISSN - 0006-341X
DOI - 10.1111/j.1541-0420.2007.00845.x
Subject(s) - mahalanobis distance , nonparametric statistics , permutation (music) , statistics , mathematics , divergence (linguistics) , sequence (biology) , combinatorics , biology , genetics , linguistics , physics , philosophy , acoustics
Summary Consider a placebo‐controlled preventive HIV vaccine efficacy trial. An HIV amino acid sequence is measured from each volunteer who acquires HIV, and these sequences are aligned together with the reference HIV sequence represented in the vaccine. We develop genome scanning methods to identify positions at which the amino acids in infected vaccine recipient sequences either (A) are more divergent from the reference amino acid than the amino acids in infected placebo recipient sequences or (B) have a different frequency distribution than the placebo sequences, irrespective of a reference amino acid. We consider t ‐test‐type statistics for problem A and Euclidean, Mahalanobis, and Kullback–Leibler‐type statistics for problem B. The test statistics incorporate weights to reflect biological information contained in different amino acid positions and mismatches. Position‐specific p ‐values are obtained by approximating the null distribution of the statistics either by a permutation procedure or by a nonparametric estimation. A permutation method is used to estimate a cut‐off p ‐value to control the per comparison error rate at a prespecified level. The methods are examined in simulations and are applied to two HIV examples. The methods for problem B address the general problem of comparing discrete frequency distributions between groups in a high‐dimensional data setting.