z-logo
open-access-imgOpen Access
Unique Ion Filter: A Data Reduction Tool for GC/MS Data Preprocessing Prior to Chemometric Analysis
Author(s) -
Lawrence A. Adutwum,
James J. Harynuk
Publication year - 2014
Publication title -
analytical chemistry
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 2.117
H-Index - 332
eISSN - 1520-6882
pISSN - 0003-2700
DOI - 10.1021/ac501660a
Subject(s) - chemometrics , feature selection , chemistry , data reduction , pattern recognition (psychology) , dimensionality reduction , data set , linear discriminant analysis , raw data , data pre processing , reduction (mathematics) , filter (signal processing) , artificial intelligence , data mining , chromatography , computer science , statistics , mathematics , geometry , computer vision
Using raw GC/MS data as the X-block for chemometric modeling has the potential to provide better classification models for complex samples when compared to using the total ion current (TIC), extracted ion chromatograms/profiles (EIC/EIP), or integrated peak tables. However, the abundance of raw GC/MS data necessitates some form of data reduction/feature selection to remove the variables containing primarily noise from the data set. Several algorithms for feature selection exist; however, due to the extreme number of variables (10(6)-10(8) variables per chromatogram), the feature selection time can be prolonged and computationally expensive. Herein, we present a new prefilter for automated data reduction of GC/MS data prior to feature selection. This tool, termed unique ion filter (UIF), is a module that can be added after chromatographic alignment and prior to any subsequent feature selection algorithm. The UIF objectively reduces the number of irrelevant or redundant variables in raw GC/MS data, while preserving potentially relevant analytical information. In the m/z dimension, data are reduced from a full spectrum to a handful of unique ions for each chromatographic peak. In the time dimension, data are reduced to only a handful of scans around each peak apex. UIF was applied to a data set of GC/MS data for a variety of gasoline samples to be classified using partial least-squares discriminant analysis (PLS-DA) according to octane rating. It was also applied to a series of chromatograms from casework fire debris analysis to be classified on the basis of whether or not signatures of gasoline were detected. By reducing the overall population of candidate variables subjected to subsequent variable selection, the UIF reduced the total feature selection time for which a perfect classification of all validation data was achieved from 373 to 9 min (98% reduction in computing time). Additionally, the significant reduction in included variables resulted in a concomitant reduction in noise, improving overall model quality. A minimum of two um/z and scan window of three about the peak apex could provide enough information about each peak for the successful PLS-DA modeling of the data as 100% model prediction accuracy was achieved. It is also shown that the application of UIF does not alter the underlying chemical information in the data.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom