Combining metadata and co-citations for recommending related papers
Author(s) -
Shahbaz Ahmad,
Muhammad Tanvir Afzal
Publication year - 2019
Publication title -
turkish journal of electrical engineering and computer sciences
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.225
H-Index - 30
eISSN - 1303-6203
pISSN - 1300-0632
DOI - 10.3906/elk-1908-19
Subject(s) - metadata , computer science , information retrieval , learning to rank , ranking (information retrieval) , benchmark (surveying) , citation , scope (computer science) , rank (graph theory) , co citation , set (abstract data type) , data mining , world wide web , mathematics , geography , geodesy , combinatorics , programming language
Identification of relevant documents is performed to keep track of the state-of-the-art methods and relies on research paper recommender systems. The proposed approaches for these systems can be classified into categories like content-based, collaborative filtering-based, and bibliographic information-based approaches. The content-based approaches exploit the full text of articles and provide more promising results than other approaches. However, most content is not freely available because of subscription requirements. Therefore, the scope of content-based approaches is limited. In such scenarios, the best possible alternative could be the exploitation of other openly available resources. Therefore, this research explores the possible use of metadata and bibliographic information to find related articles. The approach incorporates metadata with co-citations to find and rank related articles against a query paper. The similarity score of metadata fields is calculated and combined with co-citations. The proposed approach is evaluated on a newly constructed dataset of 5116 articles. The benchmark ranking against each co-cited document set is established by applying Jensen--Shannon divergence JSD and results are evaluated with the state-of-the-art content-based approach in terms of normalized discounted cumulative gain NDCG . The state-of-the-art content-based approach achieved an NDCG score of 0.86 while the traditional co-citation-based approach scored 0.72. The presented method achieved NDCG scores of 0.73, 0.77, and 0.78 by incorporating the title, co-citation and title, and abstract, respectively, whereas the highest NDCG score of 0.77 was achieved by combining co-citations with metadata. However, better results are achieved by incorporating the title and abstract with NDCG score of 0.81. Therefore, it can be concluded that the proposed approach could be a better alternative in cases where content is unavailable.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom