Open Access
Multivariate statistical analysis and partitioning of sedimentary geochemical data sets: General principles and specific MATLAB scripts
Author(s) -
Pisias Nicklas G.,
Murray Richard W.,
Scudder Rachel P.
Publication year - 2013
Publication title -
geochemistry, geophysics, geosystems
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.928
H-Index - 136
ISSN - 1525-2027
DOI - 10.1002/ggge.20247
Subject(s) - scripting language , multivariate statistics , fortran , computer science , set (abstract data type) , matlab , compositional data , data set , data mining , statistical hypothesis testing , multivariate analysis , geology , programming language , statistics , artificial intelligence , machine learning , mathematics
Multivariate statistical treatments of large data sets in sedimentary geochemical and other fields are rapidly becoming more popular as analytical and computational capabilities expand. Because geochemical data sets present a unique set of conditions (e.g., the closed array), application of generic off‐the‐shelf applications is not straightforward and can yield misleading results. We present here annotated MATLAB scripts (and specific guidelines for their use) for Q‐mode factor analysis, a constrained least squares multiple linear regression technique, and a total inversion protocol, that are based on the well‐known approaches taken by Dymond (1981), Leinen and Pisias (1984), Kyte et al. (1993), and their predecessors. Although these techniques have been used by investigators for the past decades, their application has been neither consistent nor transparent, as their code has remained in‐house or in formats not commonly used by many of today's researchers (e.g., FORTRAN). In addition to providing the annotated scripts and instructions for use, we discuss general principles to be considered when performing multivariate statistical treatments of large geochemical data sets, provide a brief contextual history of each approach, explain their similarities and differences, and include a sample data set for the user to test their own manipulation of the scripts.