
Random forest based pseudorandom sequences classification algorithm
Author(s) -
Alexander Kozachok,
A. A. Spirin,
Oksana Golembiovskaya
Publication year - 2020
Publication title -
doklady tomskogo gosudarstvennogo universiteta sistem upravleniâ i radioèlektroniki
Language(s) - English
Resource type - Journals
ISSN - 1818-0442
DOI - 10.21293/1818-0442-2020-23-3-55-60
Subject(s) - pseudorandom number generator , encryption , byte , random forest , computer science , algorithm , pseudorandom binary sequence , binary number , random number generation , pattern recognition (psychology) , data mining , mathematics , artificial intelligence , arithmetic , operating system
Recently, the number of confidential data leaks caused by internal violators has increased. Since modern DLP-systems cannot detect and prevent information leakage channels in encrypted or compressed form, an algorithm was proposed to classify pseudo-random sequences formed by data encryption and compression algorithms. Algorithm for constructing a random forest was used. An array of the frequency of occurrence of binary subsequences of 9-bit length and statistical characteristics of the byte distribution of sequences was chosen as the feature space. The presented algorithm showed the accuracy of 0,99 for classification of pseudorandom sequences. The proposed algorithm will improve the existing DLP-systems by increasing the accuracy of classification of encrypted and compressed data.