Premium
An Adaptive Strategy for Single‐ and Multi‐Cluster Gene Assignment
Author(s) -
Garg Sanjeev,
Hansen Marc F.,
Rowe David W.,
Achenie Luke E. K.
Publication year - 2008
Publication title -
biotechnology progress
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.572
H-Index - 129
eISSN - 1520-6033
pISSN - 8756-7938
DOI - 10.1021/bp025648p
Subject(s) - cluster analysis , centroid , mathematics , a priori and a posteriori , residual , analysis of variance , euclidean distance , expression (computer science) , set (abstract data type) , pearson product moment correlation coefficient , statistics , dimensionality reduction , cluster (spacecraft) , computer science , algorithm , artificial intelligence , philosophy , epistemology , programming language
Abstract Strict assignment of genes to one class, dimensionality reduction, a priori specification of the number of classes, the need for a training set, nonunique solution, and complex learning mechanisms are some of the inadequacies of current clustering algorithms. Existing algorithms cluster genes on the basis of high positive correlations between their expression patterns. However, genes with strong negative correlations can also have similar functions and are most likely to have a role in the same pathways. To address some of these issues, we propose the adaptive centroid algorithm (ACA), which employs an analysis of variance (ANOVA)‐based performance criterion. The ACA also uses Euclidian distances, the center‐of‐mass principle for heterogeneously distributed mass elements, and the given data set to give unique solutions. The proposed approach involves three stages. In the first stage a two‐way ANOVA of the gene expression matrix is performed. The two factors in the ANOVA are gene expression and experimental condition. The residual mean squared error (MSE) from the ANOVA is used as a performance criterion in the ACA. Finally, correlated clusters are found based on the Pearson correlation coefficients. To validate the proposed approach, a two‐way ANOVA is again performed on the discovered clusters. The results from this last step indicate that MSEs of the clusters are significantly lower compared to that of the fibroblast‐serum gene expression matrix. The ACA is employed in this study for single‐ as well as multi‐cluster gene assignments.