Premium
Identifying ISI ‐indexed articles by their lexical usage: A text analysis approach
Author(s) -
Moohebat Mohammadreza,
Raj Ram Gopal,
Kareem Sameem Binti Abdul,
Thorleuchter Dirk
Publication year - 2015
Publication title -
journal of the association for information science and technology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.903
H-Index - 145
eISSN - 2330-1643
pISSN - 2330-1635
DOI - 10.1002/asi.23194
Subject(s) - computer science , natural language processing , artificial intelligence , lexical analysis , bayesian probability , information retrieval , support vector machine , machine learning
This research creates an architecture for investigating the existence of probable lexical divergences between articles, categorized as I nstitute for S cientific Information ( ISI ) and non‐ ISI , and consequently, if such a difference is discovered, to propose the best available classification method. Based on a collection of ISI ‐ and non‐ ISI ‐indexed articles in the areas of business and computer science, three classification models are trained. A sensitivity analysis is applied to demonstrate the impact of words in different syntactical forms on the classification decision. The results demonstrate that the lexical domains of ISI and non‐ ISI articles are distinguishable by machine learning techniques. Our findings indicate that the support vector machine identifies ISI ‐indexed articles in both disciplines with higher precision than do the N aïve B ayesian and K ‐ N earest N eighbors techniques.