
Comparison of Profile Similarity Measures for Genetic Interaction Networks
Author(s) -
Raamesh Deshpande,
Benjamin VanderSluis,
Chad L. Myers
Publication year - 2013
Publication title -
plos one
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.99
H-Index - 332
ISSN - 1932-6203
DOI - 10.1371/journal.pone.0068664
Subject(s) - jaccard index , similarity (geometry) , dot product , cosine similarity , pearson product moment correlation coefficient , correlation , computer science , context (archaeology) , normalization (sociology) , data mining , set (abstract data type) , genetic similarity , artificial intelligence , pattern recognition (psychology) , mathematics , statistics , biology , population , genetic diversity , paleontology , geometry , demography , sociology , anthropology , image (mathematics) , programming language
Analysis of genetic interaction networks often involves identifying genes with similar profiles, which is typically indicative of a common function. While several profile similarity measures have been applied in this context, they have never been systematically benchmarked. We compared a diverse set of correlation measures, including measures commonly used by the genetic interaction community as well as several other candidate measures, by assessing their utility in extracting functional information from genetic interaction data. We find that the dot product, one of the simplest vector operations, outperforms most other measures over a large range of gene pairs. More generally, linear similarity measures such as the dot product, Pearson correlation or cosine similarity perform better than set overlap measures such as Jaccard coefficient. Similarity measures that involve L 2 -normalization of the profiles tend to perform better for the top-most similar pairs but perform less favorably when a larger set of gene pairs is considered or when the genetic interaction data is thresholded. Such measures are also less robust to the presence of noise and batch effects in the genetic interaction data. Overall, the dot product measure performs consistently among the best measures under a variety of different conditions and genetic interaction datasets.