z-logo
Premium
A Perspective on Cluster Analysis
Author(s) -
Kettenring Jon R.
Publication year - 2008
Publication title -
statistical analysis and data mining: the asa data science journal
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.381
H-Index - 33
eISSN - 1932-1872
pISSN - 1932-1864
DOI - 10.1002/sam.10001
Subject(s) - library science , perspective (graphical) , citation , computer science , sociology , information retrieval , artificial intelligence
In 2004, I began a comprehensive investigation of how cluster analysis is applied across the sciences. It involved an extensive online search of the Web of Science1 databases [1], concentrating on the years from 1995 to 2003. The results were reported in Ref. [2]. More recently, I have been studying how cluster analysis arises in US patents by doing systematic searches of the US patent database [3] for 2006 and early 2007. I used patents as an imperfect proxy for sizing up the degree of interest in cluster analysis in the commercial world. Altogether I reviewed in some detail several hundred papers and patents and many more in cursory fashion. If nothing else, these exercises confirmed my belief that cluster analysis is among the most needed and widely used of the multivariate statistical methodologies and also perhaps the one with the most malpractice. I was surprised to learn not only the extent of the applications of clustering techniques but also their diversity, ranging from archeology to zoology and including most of the sciences in between. Overall, it is the life sciences that dominate. They appear to be largely responsible for the dramatic increase in published applications, which grew at roughly a 9% compound growth rate over the nine years of my study. I also estimated that the number of patents that at least mention cluster analysis grew nearly fivefold during this same period. Another striking feature of the papers and patents is the tremendous variety in the level of sophistication involved. Many of the most recent patents use very clever ways of analyzing text, images, or multimedia documents to help categorize them and/or to assist retrieval. For example, Patent 7,139,695 describes a methodology for improving the ‘semantic affinity’ of clusters of documents by grouping them sequentially, first using nouns, then verbs, then adjectives, and so on. If successful, it may improve performance over well-established and widely tested methods such as latent semantic indexing [4]. Moreover, one can

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here