
A Hybrid Approach to Recommending Universal Decimal Classification Codes for Cataloguing in Slovenian Digital Libraries
Author(s) -
Mladen Borovic,
Milan Ojstersek,
Damjan Strnad
Publication year - 2022
Publication title -
ieee access
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.587
H-Index - 127
ISSN - 2169-3536
DOI - 10.1109/access.2022.3198706
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
In this article we present a hybrid approach to recommending the Universal Decimal Classification (UDC) codes for unclassified documents. By recommending UDC codes to librarians, we can provide the decision support as part of a semi-automatic cataloguing process. As current approaches to recommending UDC codes are scarce and limited to certain fields of expertise, our motivation was to create a hybrid approach which covers all fields of expertise within the UDC hierarchy. The cascade hybrid approach combines the BM25 ranking function with a multi-label classifier based on a BERT (Bidirectional Encoder Representations from Transformers) deep neural network architecture. Additionally, lists of the most commonly used UDC codes within a document’s origin organization are used as a final content-based filtering method. The BM25 ranking function is used to create an initial list of recommendations. The first cascade step re-ranks the initial list of recommendations using the list of recommendations produced by the multi-label BERT-based classifier. The second cascade step re-ranks the resulting recommendation list from the first cascade step using a list of most commonly used UDC codes within the document’s organization. Finally, post-processing steps are applied to obtain the final list of recommended UDC codes. We present in detail the UDC structure, the used text corpus of documents and the functioning of our hybrid recommendation approach. We perform the analysis of the generated recommendation lists for the corpus from the Slovenian Open-Access Infrastructure using the metrics hit-ratio, normalized discounted cumulative gain, mean reciprocal rank and mean average precision. Our hybrid recommendation approach improves reference scores of individual methods for these metrics.