Premium
The normalization of occurrence and C o‐occurrence matrices in bibliometrics using Cosine similarities and O chiai coefficients
Author(s) -
Zhou Qiuju,
Leydesdorff Loet
Publication year - 2016
Publication title -
journal of the association for information science and technology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.903
H-Index - 145
eISSN - 2330-1643
pISSN - 2330-1635
DOI - 10.1002/asi.23603
Subject(s) - normalization (sociology) , cosine similarity , trigonometric functions , scaling , similarity (geometry) , mathematics , multidimensional scaling , matrix (chemical analysis) , combinatorics , computer science , algorithm , statistics , artificial intelligence , materials science , geometry , composite material , sociology , anthropology , image (mathematics) , cluster analysis
We prove that O chiai similarity of the co‐occurrence matrix is equal to cosine similarity in the underlying occurrence matrix. Neither the cosine nor the Pearson correlation should be used for the normalization of co‐occurrence matrices because the similarity is then normalized twice, and therefore overestimated; the O chiai coefficient can be used instead. Results are shown using a small matrix (5 cases, 4 variables) for didactic reasons, and also A hlgren et al.'s (2003) co‐occurrence matrix of 24 authors in library and information sciences. The overestimation is shown numerically and will be illustrated using multidimensional scaling and cluster dendograms. If the occurrence matrix is not available (such as in internet research or author cocitation analysis) using O chiai for the normalization is preferable to using the cosine.