Orders of magnitude speed increase in partial least squares feature selection with new simple indexing technique for very tall data sets | Zendy

Stefansson Petter | Zendy; Indahl Ulf G. | Zendy; Liland Kristian H. | Zendy; Burud Ingunn | Zendy

Premium

Orders of magnitude speed increase in partial least squares feature selection with new simple indexing technique for very tall data sets

Author(s) -

Stefansson Petter,

Indahl Ulf G.,

Liland Kristian H.,

Burud Ingunn

Publication year - 2019

Publication title -

journal of chemometrics

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.47

H-Index - 92

eISSN - 1099-128X

pISSN - 0886-9383

DOI - 10.1002/cem.3141

Subject(s) - feature (linguistics) , feature selection , partial least squares regression , covariance , kernel (algebra) , mathematics , covariance matrix , pattern recognition (psychology) , search engine indexing , regression , set (abstract data type) , algorithm , computer science , artificial intelligence , statistics , philosophy , linguistics , combinatorics , programming language

Feature selection is a challenging combinatorial optimization problem that tends to require a large number of candidate feature subsets to be evaluated before a satisfying solution is obtained. Because of the computational cost associated with estimating the regression coefficients for each subset, feature selection can be an immensely time‐consuming process and is often left inadequately explored. Here, we propose a simple modification to the conventional sequence of calculations involved when fitting a number of feature subsets to the same response data with partial least squares (PLS) model fitting. The modification consists in establishing the covariance matrix for the full set of features by an initial calculation and then deriving the covariance of all subsequent feature subsets solely by indexing into the original covariance matrix. By choosing this approach, which is primarily suitable for tall design matrices with significantly more rows than columns, we avoid redundant (identical) recalculations in the evaluation of different feature subsets. By benchmarking the time required to solve regression problems of various sizes, we demonstrate that the introduced technique outperforms traditional approaches by several orders of magnitude when used in conjunction with PLS modeling. In the supplementary material, we provide code for implementing the concept with kernel PLS regression.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore