Premium
An efficient and accurate numerical determination of the cluster resolution metric in two dimensions
Author(s) -
Armstrong Michael Sorochan,
Mata A. Paulina,
Harynuk James J.
Publication year - 2021
Publication title -
journal of chemometrics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.47
H-Index - 92
eISSN - 1099-128X
pISSN - 0886-9383
DOI - 10.1002/cem.3346
Subject(s) - metric (unit) , subspace topology , feature selection , computation , resolution (logic) , cluster (spacecraft) , feature (linguistics) , ellipse , algorithm , computer science , selection (genetic algorithm) , variable (mathematics) , mathematics , data mining , pattern recognition (psychology) , artificial intelligence , mathematical analysis , linguistics , economics , programming language , operations management , philosophy , geometry
Cluster resolution (CR) is a useful metric for guiding automated feature selection of classification models. CR is a measure of class separation in a linear subspace for variable subsets via the determination of maximal, non‐intersecting confidence ellipses. Feature selection by cluster resolution (FS‐CR) is most commonly used to extract panels of useful, discriminating features from sparsely populated chromatographic peak tables, optimizing models from raw signals, or when working with datasets with many more variables than samples. The absence of a numerical method for calculating CR necessitates a great deal of dynamic programming and algorithmic complexity. In this work, we present a numerical determination of the CR metric, which reduces computation time by about 65 times when compared with the dynamic programming approach and simplifies the operating principles of FS‐CR algorithm.