Premium
Allergenicity prediction by artificial neural networks
Author(s) -
Dimitrov Ivan,
Naneva Lyudmila,
Bangov Ivan,
Doytchinova Irini
Publication year - 2014
Publication title -
journal of chemometrics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.47
H-Index - 92
eISSN - 1099-128X
pISSN - 0886-9383
DOI - 10.1002/cem.2597
Subject(s) - covariance , artificial neural network , computer science , transformation (genetics) , binary number , set (abstract data type) , similarity (geometry) , protein structure prediction , algorithm , artificial intelligence , pattern recognition (psychology) , mathematics , protein structure , biology , statistics , image (mathematics) , biochemistry , arithmetic , gene , programming language
Two artificial neural network (ANN)‐based algorithms for allergenicity prediction were developed and tested. The first algorithm consists of three steps. Initially, the protein sequences are described by amino acid principal properties as hydrophobicity, size, relative abundance, helix and β‐strand forming propensities. Second, the generated strings of different length are converted into vectors with equal length by auto‐covariance and cross‐covariance (ACC). At the third step, ANN is applied to discriminate between allergens and non‐allergens. The second algorithm consists of four steps. It has one additional step before the final ANN modeling. At this step, the ACC vectors are transformed into binary fingerprints. The algorithms were applied to a set of 2427 known allergens and 2427 non‐allergens and compared in terms of predictive ability. The three‐step algorithm performed better than the four‐step one identifying 82% versus 76% of the allergens and non‐allergens. The ANN algorithms presented here are universal. They could be applied for any classification problem in computational biology. The amino acid descriptors are able to capture the main structural and physicochemical properties of amino acids building the proteins. The ACC transformation overcomes the main problem in the alignment‐based comparative studies arising from the different length of the aligned protein sequences. The uniform‐length vectors allow similarity search and classification by different computational methods. Optionally, the ACC vectors could be converted into binary descriptor fingerprints. The comparative study on several Web tools for allergenicity prediction showed that the usage of more than one predictor is reasonable and recommendable because some of the tools recognize better the allergens, some of them—the non‐allergens, but none of them—both. Copyright © 2014 John Wiley & Sons, Ltd.