An auxiliary  Part‐of‐Speech  tagger for blog and microblog cyber‐slang | Zendy

Golia Silvia | Zendy; Zola Paola | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

An auxiliary Part‐of‐Speech tagger for blog and microblog cyber‐slang

Author(s) -

Golia Silvia,

Zola Paola

Publication year - 2023

Publication title -

statistical analysis and data mining: the asa data science journal

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.381

H-Index - 33

eISSN - 1932-1872

pISSN - 1932-1864

DOI - 10.1002/sam.11596

Subject(s) - computer science , slang , security token , natural language processing , part of speech , part of speech tagging , artificial intelligence , social media , information retrieval , world wide web , linguistics , philosophy , computer security

The increasing impact of Web 2.0 involves a growing usage of slang, abbreviations, and emphasized words, which limit the performance of traditional natural language processing models. The state‐of‐the‐art Part‐of‐Speech (POS) taggers are often unable to assign a meaningful POS tag to all the words in a Web 2.0 text. To solve this limitation, we are proposing an auxiliary POS tagger that assigns the POS tag to a given token based on the information deriving from a sequence of preceding and following POS tags. The main advantage of the proposed auxiliary POS tagger is its ability to overcome the need of tokens' information since it only relies on the sequences of existing POS tags. This tagger is called auxiliary because it requires an initial POS tagging procedure that might be performed using online dictionaries (e.g., Wikidictionary) or other POS tagging algorithms. The auxiliary POS tagger relies on a Bayesian network that uses information about preceding and following POS tags. It was evaluated on the Brown Corpus, which is a general linguistics corpus, on the modern ARK dataset composed by Twitter messages, and on a corpus of manually labeled Web 2.0 data.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Accelerating Research