z-logo
open-access-imgOpen Access
Generating pseudo-random texts based on the frequency characteristics of texts in natural languages
Author(s) -
Yurii Kotov,
Olga Sanina
Publication year - 2020
Publication title -
sbornik naučnyh trudov ngtu
Language(s) - English
Resource type - Journals
ISSN - 2307-6879
DOI - 10.17212/2307-6879-2020-1-2-113-126
Subject(s) - bigram , computer science , natural language processing , artificial intelligence , natural language , natural (archaeology) , word lists by frequency , frequency distribution , distribution (mathematics) , speech recognition , mathematics , statistics , history , trigram , mathematical analysis , archaeology , sentence
The paper discusses generation of pseudo-random texts based on frequency characteristics of texts in natural languages. The follow frequency characteristics of texts and their values for the Russian and English languages are considered for generation: the distribution of unigrams and bigrams over frequency of occurrence in texts, the distribution of words over the length. Based on the considered frequency characteristics, an algorithm for generating pseudo-random texts is suggested. Texts generated according to the algorithm are studied in experiments of language recognition in texts.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here