z-logo
Premium
Subset Clustering of Binary Sequences, with an Application to Genomic Abnormality Data
Author(s) -
Hoff Peter D.
Publication year - 2005
Publication title -
biometrics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 2.298
H-Index - 130
eISSN - 1541-0420
pISSN - 0006-341X
DOI - 10.1111/j.1541-0420.2005.00381.x
Subject(s) - cluster analysis , multivariate statistics , computer science , data mining , population , cluster (spacecraft) , binary number , mathematics , pattern recognition (psychology) , artificial intelligence , machine learning , demography , arithmetic , sociology , programming language
Summary This article develops a model‐based approach to clustering multivariate binary data, in which the attributes that distinguish a cluster from the rest of the population may depend on the cluster being considered. The clustering approach is based on a multivariate Dirichlet process mixture model, which allows for the estimation of the number of clusters, the cluster memberships, and the cluster‐specific parameters in a unified way. Such a clustering approach has applications in the analysis of genomic abnormality data, in which the development of different types of tumors may depend on the presence of certain abnormalities at subsets of locations along the genome. Additionally, such a mixture model provides a nonparametric estimation scheme for dependent sequences of binary data.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here