
Selecting informative features of human gene exons
Author(s) -
Andrei V. Volkau,
Mikalai M. Yatskou,
Vasily V. Grinev
Publication year - 2019
Publication title -
žurnal belorusskogo gosudarstvennogo universiteta. matematika, informatika/žurnal belorusskogo gosudarstvennogo universiteta. matematika, informatika
Language(s) - English
Resource type - Journals
eISSN - 2617-3956
pISSN - 2520-6508
DOI - 10.33581/2520-6508-2019-1-77-89
Subject(s) - exon , computer science , feature (linguistics) , feature selection , artificial intelligence , identification (biology) , dimensionality reduction , gene , curse of dimensionality , feature vector , gene prediction , pattern recognition (psychology) , computational biology , machine learning , genetics , biology , genome , linguistics , philosophy , botany
Dimensionality reduction of the human gene exon feature space is considered with the aim of gene identification. To evaluate the performance of various feature selection algorithms, computational experiments were carried out using the examples of exons of 14 known human genes. It is proven that exons are clearly separable regarding gene affiliation. Feature selection algorithms are sensitive to noise features and allow to estimate their number. Reducing the number of features improves CPU-time, memory usage as well as reduces the complexity of a model and makes it easier to interpret. Our findings indicate that utilizing of features of flanking intronic sequences leads to better prediction models in comparison with utilizing of exon features. The results of the research provide new opportunities for study of human gene data using machine learning algorithms.