Premium
Predicting the results of molecular specific hybridization using boosted tree algorithm
Author(s) -
Zhu Weijun,
Han Yingjie,
Wu Huanmei,
Liu Yang,
Nan Xiaofei,
Zhou Qinglei
Publication year - 2018
Publication title -
concurrency and computation: practice and experience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.309
H-Index - 67
eISSN - 1532-0634
pISSN - 1532-0626
DOI - 10.1002/cpe.4982
Subject(s) - computer science , algorithm , coding (social sciences) , set (abstract data type) , tree (set theory) , software , dna–dna hybridization , dna , mathematics , biology , genetics , statistics , mathematical analysis , programming language
Summary In the field of bioinformatics and DNA computing, simulated hybridization experiments can replace real molecular hybridization experiments to some extent, avoiding some disadvantages of the actual experimental design. However, the core techniques, which are employed by the popular DNA simulation software, are limited to the exponential computational complexity of the combinatorial problems. As a result, it is impossible to decide whether a specific hybridization among complex DNA molecules is effective or not within acceptable time. To address this common problem, we hereby introduce a new method based on the machine learning technique. First, a sample set is employed to train the boosted tree algorithm, which resulted in a corresponding machine learning model. Second, this model is applied to predict the classification results of molecular hybridization for a given group of DNA molecular coding. The experiment results showed that the new method had an average accuracy level of 94.2% and an average efficiency level 90 839 times higher than that of the existing representative approaches. Especially for the case study in this paper, the efficiency of the new method is 235 000, 250 000, and 990 000 times higher than that of the three existing methods, respectively. These experimental results indicate that our new approach can quickly and accurately determine the biological effectiveness of molecular hybridization for a given DNA design.