Premium
Development and Update Process of VNIR‐Based Models Built to Predict Soil Organic Carbon
Author(s) -
Sequeira Cleiton H.,
Wills Skye A.,
Grunwald Sabine,
Ferguson Richard R.,
Benham Ellis C.,
West Larry T.
Publication year - 2014
Publication title -
soil science society of america journal
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.836
H-Index - 168
eISSN - 1435-0661
pISSN - 0361-5995
DOI - 10.2136/sssaj2013.08.0354
Subject(s) - vnir , computer science , preprocessor , data mining , mean squared error , predictive modelling , residual , robustness (evolution) , multivariate statistics , machine learning , artificial intelligence , hyperspectral imaging , statistics , mathematics , algorithm , biochemistry , chemistry , gene
The large number of samples, time, and cost to assess soil organic C (SOC) with standard procedures has led to the interest in proximal sensing with visible and near‐infrared (VNIR) diffuse reflectance spectroscopy. The objectives of the present study were to (i) evaluate the effect of multivariate techniques and spectra preprocessing methods on the performance of VNIR‐based models, (ii) evaluate the effect of subsetting datasets to improve the prediction accuracy of models, and (iii) present a systematic iterative model development and update process. There were three datasets: Dataset‐1 was used to the initial model development; Dataset‐2 was used to revalidate models developed with Dataset‐1; Dataset‐3 was used to update promising models identified with Dataset‐1 and ‐2. During initial model development with Dataset‐1, the dataset was subset in clusters to try to improve model performance. Subsetting datasets did not improve model performance. Revalidating models with Dataset‐2 helped to identify the lack of robustness in the initial models. This is related to the increased sample diversity in Dataset‐2 compared to Dataset‐1 and highlights the importance of continuously updating models to cover more variability. Based on Dataset‐1 and 2, promising models were updated with the larger and more diverse Dataset‐3. Following this update, the best model had a coefficient of multiple determination ( R 2 ), root mean squared prediction error (RMSPE), and residual prediction deviation (RPD) of 0.95, 2.062, and 4.39%, respectively. Collecting and evaluating data in separate sets allowed models to be revalidated and updated with new independent samples. This continuous process provides robust models to end users.