Enhancing MEDLINE document clustering by incorporating MeSH semantic similarity
Author(s) -
Shanfeng Zhu,
Jia Zeng,
Hiroshi Mamitsuka
Publication year - 2009
Publication title -
bioinformatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 3.599
H-Index - 390
eISSN - 1367-4811
pISSN - 1367-4803
DOI - 10.1093/bioinformatics/btp338
Subject(s) - information retrieval , computer science , semantic similarity , cluster analysis , similarity (geometry) , thesaurus , vector space model , document clustering , search engine indexing , medline , data mining , natural language processing , artificial intelligence , political science , law , image (mathematics)
Clustering MEDLINE documents is usually conducted by the vector space model, which computes the content similarity between two documents by basically using the inner-product of their word vectors. Recently, the semantic information of MeSH (Medical Subject Headings) thesaurus is being applied to clustering MEDLINE documents by mapping documents into MeSH concept vectors to be clustered. However, current approaches of using MeSH thesaurus have two serious limitations: first, important semantic information may be lost when generating MeSH concept vectors, and second, the content information of the original text has been discarded.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom