Unique Ion Filter: A Data Reduction Tool for GC/MS Data Preprocessing Prior to Chemometric Analysis | Zendy

Lawrence A. Adutwum | Zendy; James J. Harynuk | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Unique Ion Filter: A Data Reduction Tool for GC/MS Data Preprocessing Prior to Chemometric Analysis

Author(s) -

Lawrence A. Adutwum,

James J. Harynuk

Publication year - 2014

Publication title -

analytical chemistry

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 2.117

H-Index - 332

eISSN - 1520-6882

pISSN - 0003-2700

DOI - 10.1021/ac501660a

Subject(s) - chemometrics , feature selection , chemistry , data reduction , pattern recognition (psychology) , dimensionality reduction , data set , linear discriminant analysis , raw data , data pre processing , reduction (mathematics) , filter (signal processing) , artificial intelligence , data mining , chromatography , computer science , statistics , mathematics , geometry , computer vision

Using raw GC/MS data as the X-block for chemometric modeling has the potential to provide better classification models for complex samples when compared to using the total ion current (TIC), extracted ion chromatograms/profiles (EIC/EIP), or integrated peak tables. However, the abundance of raw GC/MS data necessitates some form of data reduction/feature selection to remove the variables containing primarily noise from the data set. Several algorithms for feature selection exist; however, due to the extreme number of variables (10(6)-10(8) variables per chromatogram), the feature selection time can be prolonged and computationally expensive. Herein, we present a new prefilter for automated data reduction of GC/MS data prior to feature selection. This tool, termed unique ion filter (UIF), is a module that can be added after chromatographic alignment and prior to any subsequent feature selection algorithm. The UIF objectively reduces the number of irrelevant or redundant variables in raw GC/MS data, while preserving potentially relevant analytical information. In the m/z dimension, data are reduced from a full spectrum to a handful of unique ions for each chromatographic peak. In the time dimension, data are reduced to only a handful of scans around each peak apex. UIF was applied to a data set of GC/MS data for a variety of gasoline samples to be classified using partial least-squares discriminant analysis (PLS-DA) according to octane rating. It was also applied to a series of chromatograms from casework fire debris analysis to be classified on the basis of whether or not signatures of gasoline were detected. By reducing the overall population of candidate variables subjected to subsequent variable selection, the UIF reduced the total feature selection time for which a perfect classification of all validation data was achieved from 373 to 9 min (98% reduction in computing time). Additionally, the significant reduction in included variables resulted in a concomitant reduction in noise, improving overall model quality. A minimum of two um/z and scan window of three about the peak apex could provide enough information about each peak for the successful PLS-DA modeling of the data as 100% model prediction accuracy was achieved. It is also shown that the application of UIF does not alter the underlying chemical information in the data.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research