
Fast and interpretable genomic data analysis using multiple approximate kernel learning
Author(s) -
Ayyüce Begüm Bektaş,
Çiğdem Ak,
Mehmet Gönen
Publication year - 2022
Publication title -
bioinformatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 3.599
H-Index - 390
eISSN - 1367-4811
pISSN - 1367-4803
DOI - 10.1093/bioinformatics/btac241
Subject(s) - multiple kernel learning , scalability , computer science , interpretability , kernel (algebra) , replicate , lasso (programming language) , machine learning , artificial intelligence , kernel method , sample size determination , feature selection , support vector machine , mathematics , statistics , combinatorics , database , world wide web
Dataset sizes in computational biology have been increased drastically with the help of improved data collection tools and increasing size of patient cohorts. Previous kernel-based machine learning algorithms proposed for increased interpretability started to fail with large sample sizes, owing to their lack of scalability. To overcome this problem, we proposed a fast and efficient multiple kernel learning (MKL) algorithm to be particularly used with large-scale data that integrates kernel approximation and group Lasso formulations into a conjoint model. Our method extracts significant and meaningful information from the genomic data while conjointly learning a model for out-of-sample prediction. It is scalable with increasing sample size by approximating instead of calculating distinct kernel matrices.