Premium
Good debt or bad debt: Detecting semantic orientations in economic texts
Author(s) -
Malo Pekka,
Sinha Ankur,
Korhonen Pekka,
Wallenius Jyrki,
Takala Pyry
Publication year - 2014
Publication title -
journal of the association for information science and technology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.903
H-Index - 145
eISSN - 2330-1643
pISSN - 2330-1635
DOI - 10.1002/asi.23062
Subject(s) - computer science , natural language processing , lexicon , phrase , sentiment analysis , artificial intelligence , sentence , feature (linguistics) , computational finance , finance , linguistics , economics , philosophy
The use of robo‐readers to analyze news texts is an emerging technology trend in computational finance. Recent research has developed sophisticated financial polarity lexicons for investigating how financial sentiments relate to future company performance. However, based on experience from fields that commonly analyze sentiment, it is well known that the overall semantic orientation of a sentence may differ from that of individual words. This article investigates how semantic orientations can be better detected in financial and economic news by accommodating the overall phrase‐structure information and domain‐specific use of language. Our three main contributions are the following: (a) a human‐annotated finance phrase bank that can be used for training and evaluating alternative models; (b) a technique to enhance financial lexicons with attributes that help to identify expected direction of events that affect sentiment; and (c) a linearized phrase‐structure model for detecting contextual semantic orientations in economic texts. The relevance of the newly added lexicon features and the benefit of using the proposed learning algorithm are demonstrated in a comparative study against general sentiment models as well as the popular word frequency models used in recent financial studies. The proposed framework is parsimonious and avoids the explosion in feature space caused by the use of conventional n‐gram features.