
Keyphrase Extraction for Technical Language Processing
Author(s) -
Alden A. Dima,
Aaron K. Massey
Publication year - 2022
Publication title -
journal of research of the national institute of standards and technology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.202
H-Index - 59
eISSN - 2165-7254
pISSN - 1044-677X
DOI - 10.6028/jres.126.053
Subject(s) - computer science , semeval , classifier (uml) , natural language processing , metadata , information retrieval , artificial intelligence , measure (data warehouse) , task (project management) , world wide web , database , management , economics
Keyphrase extraction is an important facet of annotation tools that offer theprovision of the metadata necessary for technical language processing (TLP). Because TLPimposes additional requirements on typical natural language processing (NLP) methods, weexamined TLP keyphrase extraction through the lens of a hypothetical toolkit whichconsists of a combination of text features and classifiers suitable for use inlow-resource TLP applications. We compared two approaches for keyphrase extraction: Thefirst which applied our toolkit-based methods that used only distributional features ofwords and phrases, and the second was the Maui automatic topic indexer, a well-knownacademic method. Performance was measured against two collections of technicalliterature: 1153 articles from Journal of Chemical Thermodynamics (JCT) curated by theNational Institute of Standards and Technology Thermodynamics Research Center (TRC) and244 articles from Task 5 of the Workshop on Semantic Evaluation (SemEval). Bothcollections have author-provided keyphrases available; the SemEval articles also havereader-provided keyphrases. Our findings indicate that our toolkit approach wascompetitive with Maui when author-provided keyphrases were first removed from the text.For the TRC-JCT articles, the Maui automatic topic indexer reported an F -measure of29.4 % while our toolkit approach obtained an F -measure of 28.2 %. For the SemEvalarticles, our toolkit approach using a Naïve Bayes classifier resulted in an F -measureof 20.8 %, which outperformed Maui’s F -measure of 18.8 %.