TOXIC COMMENTS DETECTION IN RUSSIAN | Zendy

Sergey Smetanin | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

TOXIC COMMENTS DETECTION IN RUSSIAN

Author(s) -

Sergey Smetanin

Publication year - 2020

Publication title -

kompʹûternaâ lingvistika i intellektualʹnye tehnologii

Language(s) - English

Resource type - Conference proceedings

ISSN - 2075-7182

DOI - 10.28995/2075-7182-2020-19-1149-1159

Subject(s) - computer science , encoder , sentence , annotation , artificial intelligence , the internet , natural language processing , source code , data science , machine learning , information retrieval , world wide web , operating system

Currently, social network sites tend to be one of the major communication platforms in both offline and online space. Freedom of expression of various points of view, including toxic, aggressive, and abusive comments, might have a long-term negative impact on people’s opinions and social cohesion. As a consequence, the ability to automatically identify and moderate toxic content on the Internet to eliminate the negative consequences is one of the necessary tasks for modern society. This paper aims at the automatic detection of toxic comments in the Russian language. As a source of data, we utilized anonymously published Kaggle dataset and additionally validated its annotation quality. To build a classification model, we performed fine-tuning of two versions of Multilingual Universal Sentence Encoder, Bidirectional Encoder Representations from Transformers, and ruBERT. Finetuned RuBERT achieved F1 = 92.20%, demonstrating the best classification score. We made trained models and code samples publicly available to the research community.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore