RUSSIAN LANGUAGE AND CORPUS DIVERSITY | Zendy

Alexander Piperski | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

RUSSIAN LANGUAGE AND CORPUS DIVERSITY

Author(s) -

Alexander Piperski

Publication year - 2020

Publication title -

kompʹûternaâ lingvistika i intellektualʹnye tehnologii

Language(s) - English

Resource type - Conference proceedings

ISSN - 2075-7182

DOI - 10.28995/2075-7182-2020-19-615-627

Subject(s) - computer science , linguistics , syntax , natural language processing , selection (genetic algorithm) , corpus linguistics , artificial intelligence , grammar , variation (astronomy) , the internet , text corpus , world wide web , philosophy , physics , astrophysics

This paper discusses the use of most widely-known Russian corpora, namely Russian National Corpus, ruTenTen, General Internet Corpus of Russian, and Araneum Russicum Maximum, for the theoretical study of Russian language. Based on a sample of papers from 2019, I demonstrate that scholars, especially theoretical linguists, tend to ignore the opportunities provided by a wide range of Web corpora, even though these resources are well-known to the NLP community. I present a selection of case studies to show that data from “non-classical” corpora can be used for studying various linguistic phenomena, such as: 1) variation in morphology and syntax; 2) word formation and lexical change; 3) construction grammar. I also claim that the underuse of non-classical corpora is partly due to the fact that they are (perceived as) not quite user-friendly.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore