Premium
A quartet method based on variable neighborhood search for biomedical literature extraction and clustering
Author(s) -
Consoli Sergio,
Stilianakis Nikolaos I.
Publication year - 2017
Publication title -
international transactions in operational research
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.032
H-Index - 52
eISSN - 1475-3995
pISSN - 0969-6016
DOI - 10.1111/itor.12240
Subject(s) - computer science , information retrieval , cluster analysis , ranking (information retrieval) , hierarchical clustering , data extraction , graph , data mining , xml , medline , theoretical computer science , artificial intelligence , world wide web , political science , law
Medline/PubMed is the largest reference database collecting, organizing, and analyzing biomedical literature. We propose an automated methodology that is capable of searching relevant references for systematic reviews and meta‐analysis from the Medline/PubMed database, and then to visualize the retrieved bibliography through an intuitive method based on a graph layout. In particular, document relationships are represented via the quartet method of hierarchical clustering. As this novel approach is based on an NP‐hard combinatorial problem, a reduced variable neighborhood search is used for producing the graph of document clusters as output from the input distance matrix whereby the number of clusters is not known in advance. The distance matrix is derived from the link‐ranking XML data returned by PubMed with the search results. It is demonstrated how the method allows to retrieve biomedical related bibliography, to find the structure of the literature collection examined, and to detect linked works within thematic areas of interest. With this methodology, scientists are assisted in the analysis of complex citations networks from the biomedical literature.