
Quantifying literature citations, index terms, and Gene Ontology annotations in the Saccharomyces Genome Database to assess results‐set clustering utility
Author(s) -
MacMullen W. John
Publication year - 2006
Publication title -
proceedings of the american society for information science and technology
Language(s) - English
Resource type - Journals
eISSN - 1550-8390
pISSN - 0044-7870
DOI - 10.1002/meet.14504301191
Subject(s) - set (abstract data type) , cluster analysis , computer science , information retrieval , index (typography) , ontology , search engine indexing , term (time) , rank (graph theory) , genome , database , data mining , gene , world wide web , biology , artificial intelligence , mathematics , genetics , philosophy , physics , epistemology , quantum mechanics , combinatorics , programming language
A set of 37,325 unique literature citations was identified from 120,078 literature‐based annotations in the Saccharomyces Genome Database (SGD). The citations, gene products, and related Gene Ontology (GO) annotations were analyzed to quantify unique articles, journals, genes, and to rank by publication year, language, and GO term frequency. GO terms, MeSH indexing terms, MeSH Journal Descriptors, and SGD Literature Topics were quantified and analyzed to assess their potential utility for results set clustering. Results: Bradford's Law of Scattering was shown to hold for the citations, journals, gene products, and GO annotations. Only the MeSH terms and article title/abstract pairs had significant numbers of term co‐occurrence. Multiple term types may be useful for faceted searching and clustered results set browsing if the strengths of each are leveraged.