z-logo
open-access-imgOpen Access
Effective Methodologies for Statistical Inference on Microarray Studies
Author(s) -
Makoto Aoshima,
Kazuyoshi Yat
Publication year - 2011
Publication title -
intech ebooks
Language(s) - English
Resource type - Book series
DOI - 10.5772/25607
Subject(s) - inference , statistical inference , computer science , computational biology , artificial intelligence , mathematics , statistics , biology
A common feature of high-dimensional data such as genetic microarrays is that the data dimension is extremely high, however the sample size is relatively small. This type of data is called the high-dimension, low-sample-size (HDLSS) data. Such HDLSS data present with substantial challenges to reconsider existing methods in the multivariate statistical analysis. Unfortunately, it has been known that most conventional methods break down in HDLSS situations and alternative methods are often highly sensitive to the curse of dimensionality. In this chapter, we present modern statistical methodologies that are very effective to draw statistical inference fromHDLSS data. We focus on a series of effective HDLSS methodologies developed by Aoshima and Yata (2011) and Yata and Aoshima (2009, 2010a,b, 2011a,b). We demonstrate how those methodologies perform well and bring a new insight into researches on prostate cancer. In Section 2, we first consider Principal Component Analysis (PCA) for microarray data to visualize a data structure having tens of thousands of dimension by projecting on a few dimensional PC space. We note that classical PCA cannot sufficiently visualize a latent structure of microarray data because of the curse of dimensionality. We overcome the difficulty with the help of the cross-data-matrix (CDM) methodology that was developed by Yata and Aoshima (2010a,b). Next, in Section 3, we consider an effective clustering for microarray data. We apply the CDM methodology to estimating the principal component (PC) scores. We show that a clustering method given by using a CDM-based first PC score effectively classifies individuals into two groups. We demonstrate accurate clustering by using prostate cancer data given by Singh et al. (2002). Further, in Section 4, we consider an effective classification formicroarray data. We pay special attention to the quadratic-type classification methodology developed by Aoshima and Yata (2011). We give a sample size determination for the classification so that the misclassification rates are controlled by a prespecified upper bound. We examine how the classification methodology performs well by using some microarray data sets. Finally, in Section 5, we consider a variable selection procedure to select a set of significant variables from microarray data. In most gene expression studies, it is important to select relevant genes for classification so that researchers can identify the smallest possible set of genes that can still achieve good predictive performance. We implement the two-stage 2

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom