Premium
Rapid multivariate analysis of 3D ToF‐SIMS data: graphical processor units (GPUs) and low‐discrepancy subsampling for large‐scale principal component analysis
Author(s) -
Cumpson Peter J,
Fletcher Ian W,
Sano Naoko,
Barlow Anders J
Publication year - 2016
Publication title -
surface and interface analysis
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.52
H-Index - 90
eISSN - 1096-9918
pISSN - 0142-2421
DOI - 10.1002/sia.6042
Subject(s) - principal component analysis , computer science , multivariate statistics , dimensionality reduction , computational science , algorithm , artificial intelligence , machine learning
Principal component analysis (PCA) and other multivariate analysis methods have been used increasingly to analyse and understand depth‐profiles in XPS, AES and SIMS. For large images or three‐dimensional (3D) imaging depth‐profiles, PCA has been difficult to apply until now simply because of the size of the matrices of data involved. In a recent paper, we described two algorithms, random vector 1 (RV1) and random vector 2 (RV2), that improve the speed of PCA and allow datasets of unlimited size, respectively. In this paper, we now apply the RV2 algorithm to perform PCA on full 3D time‐of‐flight SIMS data for the first time without subsampling. The dataset we process in this way is a 128 × 128 pixel depth‐profile of 120 layers, each voxel having a 70 439 value mass spectrum associated with it. This forms over a terabyte of data when uncompressed and took 27 h to process using the RV2 algorithm using a conventional windows desktop personal computer (PC). While full PCA (e.g. using RV2) is to be preferred for final reports or publications, a much more rapid method is needed during analysis sessions to inform decisions on the next analytical step. We have therefore implemented the RV1 algorithm on a PC having a graphical processor unit (GPU) card containing 2880 individual processor cores. This increases the speed of calculation by a factor of around 4.1 compared with what is possible using a fast commercially available desktop PC having central processing units alone, and full PCA is performed in less than 7 s. The size of the dataset that can be processed in this way is limited by the size of the memory on the GPU card. This is typically sufficient for two‐dimensional images but not 3D depth‐profiles without sampling. We have therefore examined efficient sampling schemes that allow a good approximate solution to the PCA problem for large 3D datasets. We find that low‐discrepancy series such as Sobol series sampling gives more rapid convergence than random sampling, and we recommend such methods for routine use. Using the GPU and low‐discrepancy series together, we anticipate that any time‐of‐flight SIMS dataset, of whatever size, can be efficiently and accurately processed into PCA components in a maximum of around 10 s using a commercial PC with a widely available GPU card, although the longer RV2 approach is still to be preferred for the presentation of final results, such as in published papers. Copyright © 2016 The Authors Surface and Interface Analysis Published by John Wiley & Sons Ltd