Premium
How many events is enough? Are you positive?
Author(s) -
Roederer Mario
Publication year - 2008
Publication title -
cytometry part a
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.316
H-Index - 90
eISSN - 1552-4930
pISSN - 1552-4922
DOI - 10.1002/cyto.a.20549
Subject(s) - computer science , data science , computational biology , biology
ONE of the more perplexing problems in flow cytometry revolves around the issue of: ‘‘Is it real’’? If a sample has one event in a particular gate, is that event real? Is it significant? Should it be believed? Is the sample ‘‘positive’’? If the answer to these is ‘‘No’’, then what is the threshold number of events above which the answer becomes ‘‘Yes’’? Flow cytometry is unique among biological technologies by providing an enormous number of measurements on which to base conclusions. Not only can more than a dozen measurements be made on each cell, millions of cells can be analyzed in the context of a single sample (tube); dozens of samples may be analyzed in the context of a single biological specimen. Hence, there is the potential for enormous precision on measurements; distributions arising from these measurements can have exceedingly small standard errors of the mean. The precision of a subset frequency is easily defined. The standard deviation for relatively rare populations is simply n where ‘‘n’’ is the number of events comprising the subset. For a gate with a single event, the relative precision on its frequency is 6100%; for a gate with 1,000 events, it is 63%. However, assay variation (biology, experimental, operator, etc.) is typically greater than 30%. Thus, once the number of events in a gate exceeds 10, the precision of the frequency measurement is dominated by assay errors, not the paucity of events analyzed. The overwhelming amount of data available by flow cytometry has led to some confusion about the statistical significance (or lack thereof) in the precision of the subset frequency measurements. Indeed, it might be tempting to conclude that when one collects a million events, finding a single event in a gate is not ‘‘meaningful’’. This question is not solely the domain of extremely rare event detection; it has become common as we characterize small subsets with additional measurements. For example, when we assess the quality of a T cell response, we often measure five different functions simultaneously (1). The quality of the response is defined by the pattern of the co-expression of the five functions—a pattern comprised of 32 possible combinations (2). Vaccine-induced T cell responses are often as low as 0.1%. With a sample of one million stimulated peripheral blood mononuclear cells, after applying gates to define singlets, viable lymphocytes, and T cells, the total number of cells that are in the cytokine gate can be only a few hundred. Dividing that into 32 fractions means that many of these functionally defined subfractions will have very few events (2). Are they ‘‘real’’? The question of whether events are ‘‘real’’ or not is fundamentally inappropriate. Of course they are ‘‘real’’. The appropriate question is: do the events represent what the researcher claims they are—in this case, a set of antigen-specific cells with a given functional response. To answer that, we must first determine what the alternative explanations for any given event are: 1) it is ‘‘noise’’ of some sort—e.g., a dead cell or cell fragment that had unusual fluorescent properties, putting it in the gates; 2) it is ‘‘experimental background’’—e.g., a real cell with the appropriate fluorescent markers, but is not a cell being quantified by the assay (in this example, a T cell that is not specific for the tested antigen, perhaps having been preactivated in vivo); 3) it is an antigen-specific cell with the appropriate properties. Only in the last case do we want to report the event in our results; unfortunately, for any given event there is no way that we can distinguish between the possibilities. This has led to some discomfort with very low event numbers, leading to the temptation to use an arbitrary minimum number of events below which a frequency measurement is deemed zero: For example, at least 10 events must be in a gate to define the sample as positive (irrespective of the frequency). Conversely, it is tempting to conclude that, upon seeing a cloud of a thousand events that are closely distributed in the desired gate, a sample is clearly positive. Both of these temptations must be avoided, as neither is based on sound principles. Consider a study in which the T cell response to a vaccine is being measured through typical intracellular cytokine staining (ICS) assay. In assessing two vaccinees, the gating strategy reveals in one subject a single event (out of one million collected) in the cytokine-positive gate. In the second vaccinee, there were one thousand positive events. Is either sample ‘‘positive’’? This question cannot be answered. The reason is that ‘‘positive’’ has a contextual meaning that is far deeper than