Identification and correction of spurious spatial correlations in microarray data
Author(s) -
Jiang Qian,
Yuval Kluger,
Haiyuan Yu,
Mark Gerstein
Publication year - 2003
Publication title -
biotechniques
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.617
H-Index - 131
eISSN - 1940-9818
pISSN - 0736-6205
DOI - 10.2144/03351bm03
Subject(s) - biology , haven , computational biology , genetics , mathematics , combinatorics
Microarray experiments are provid-ing a huge amount of genome-widedata on gene expression. Many priorexpression analyses have focused oninferring functional relationships (1–7);however, the quality control and nor-malization of the raw data that resultfrom microarrays have received less at-tention. Here we address a systematicerror that arises from microarrays anddiscuss current methods to resolve theproblem.It is well known that the data fromhigh-throughput experiments embody asignificant component of measurementerror that must be removed before anyanalysis can be applied to the data. Anintuitive idea is to repeat the experi-ments and decrease the noise by aver-aging the measurements from repli-cates (8). Unfortunately, microarraysare still difficult to repeat; in most cas-es, researchers do not have many repli-cates for analysis. A Bayesian proba-bilistic approach has been proposed toaddress the problem of the small repeti-tion number for microarray experi-ments (9). While random error can becanceled by replicate experiments, sys-tematic error will not diminish by aver-aging replicates. For example, a notori-ous systematic error in microarrayexperiments is that the expression ratioof a particular gene at different condi-tions is a function of its absolute ex-pression levels. If one uses a simplefold-change cut off, the genes with lowexpression levels tend to numericallymeet the given cut off, even thoughthey are not truly differentially ex-pressed. Different methods have beenproposed to deal with this problem(10–15).In this review, we want to direct at-tention toward a type of systematic er-ror that is manifested by the strong in-teraction between neighboring spots onthe array. If the replicate experimentsare performed on the arrays with same-chip geometry, then these interactionswill not be canceled by the replicates.We will first demonstrate this noise viaa case study, and then we will discussthe possible source of these artifacts.Finally, we will discuss current meth-ods to solve the problem; in particular,a local averaging approach called stan-dardization and normalization of mi-croarray data (SNOMAD) (16). Weexamined several different yeast mi-croarray data sets: diauxic shift, α-fac-tor-arrested cell cycle, cdc15-arrestedcell cycle, and cdc28-arrested cell cycle(17–19).To demonstrate the artifact in themicroarray data, we offer the followingevidence. The relationship betweengene expression and physical chip dis-tance can be revealed by comparing thechip distance map (Figure 1A) to an ex-pression correlation coefficient map(Figure 1B). The horizontal and verti-cal axes of these two maps representthe positions of the genes along a chro-mosome. The colors on the distanceand correlation maps represent the chipdistance and expression correlation co-efficient between gene pairs, respec-tively. Interestingly, the highly correlat-ed gene expression regions (Figure 1B,red blocks) always correspond to theshort chip distance regions (Figure 1A,red blocks), which suggests that themajor reason why two genes are detect-ed to be co-expressed is that thesegenes are located near each other on thechip. We also calculated the average cor-relation coefficient of gene expressionprofiles as a function of the physicalchip distance between two genes. Fig-ure 2 shows the result for a microarraydata set of the yeast α-arrested cell cy-cle. Without an artifact, the averagecorrelation coefficient should be inde-pendent of the chip distance. However,Figure 2 shows that the closer twogenes are on the chip, the higher theiraverage correlation coefficient is. Thisindicates that this data set contains alarge proportion of artifacts. Actually,
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom