Open Access
Generating pseudo-random texts based on the frequency characteristics of texts in natural languages
Author(s) -
Yurii Kotov,
Olga Sanina
Publication year - 2020
Publication title -
sbornik naučnyh trudov ngtu
Language(s) - English
Resource type - Journals
ISSN - 2307-6879
DOI - 10.17212/2307-6879-2020-1-2-113-126
Subject(s) - bigram , computer science , natural language processing , artificial intelligence , natural language , natural (archaeology) , word lists by frequency , frequency distribution , distribution (mathematics) , speech recognition , mathematics , statistics , history , trigram , mathematical analysis , archaeology , sentence
The paper discusses generation of pseudo-random texts based on frequency characteristics of texts in natural languages. The follow frequency characteristics of texts and their values for the Russian and English languages are considered for generation: the distribution of unigrams and bigrams over frequency of occurrence in texts, the distribution of words over the length. Based on the considered frequency characteristics, an algorithm for generating pseudo-random texts is suggested. Texts generated according to the algorithm are studied in experiments of language recognition in texts.