Generating pseudo-random texts based on the frequency characteristics of texts in natural languages | Zendy

Yurii Kotov | Zendy; Olga Sanina | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Generating pseudo-random texts based on the frequency characteristics of texts in natural languages

Author(s) -

Yurii Kotov,

Olga Sanina

Publication year - 2020

Publication title -

transaction of scientific papers of the novosibirsk state technical university

Language(s) - English

Resource type - Journals

ISSN - 2307-6879

DOI - 10.17212/2307-6879-2020-1-2-113-126

Subject(s) - bigram , computer science , natural language processing , artificial intelligence , natural language , natural (archaeology) , word lists by frequency , frequency distribution , distribution (mathematics) , speech recognition , mathematics , statistics , history , trigram , mathematical analysis , archaeology , sentence

The paper discusses generation of pseudo-random texts based on frequency characteristics of texts in natural languages. The follow frequency characteristics of texts and their values for the Russian and English languages are considered for generation: the distribution of unigrams and bigrams over frequency of occurrence in texts, the distribution of words over the length. Based on the considered frequency characteristics, an algorithm for generating pseudo-random texts is suggested. Texts generated according to the algorithm are studied in experiments of language recognition in texts.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research