z-logo
Premium
Improving the prediction performance of a large tropical vis‐ NIR spectroscopic soil library from B razil by clustering into smaller subsets or use of data mining calibration techniques
Author(s) -
Araújo S. R.,
Wetterlind J.,
Demattê J. A. M.,
Stenberg B.
Publication year - 2014
Publication title -
european journal of soil science
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.244
H-Index - 111
eISSN - 1365-2389
pISSN - 1351-0754
DOI - 10.1111/ejss.12165
Subject(s) - calibration , mean squared error , cluster analysis , linear regression , partial least squares regression , soil organic matter , support vector machine , data set , mathematics , environmental science , remote sensing , soil science , computer science , statistics , soil water , artificial intelligence , geology
Summary Effective agricultural planning requires basic soil information. In recent decades visible near‐infrared diffuse reflectance spectroscopy (vis‐ NIR ) has been shown to be a viable alternative for rapidly analysing soil properties. We studied 7172 samples of seven different soil types collected from several regions of B razil and varying in organic matter ( OM ) (0.2–10.3%) and clay content (0.2–99.0%). The aim was to explore the possibility of enhancing the performance of vis‐ NIR data in predicting organic matter and clay content in this library by dividing it into smaller sub‐libraries on the basis of their vis‐ NIR spectra. We used partial least square regression ( PLSR ) models on the sub‐libraries and compared the results with PLSR and two non‐linear calibration techniques, boosted regression trees ( BT ) and support vector machines ( SVM ) applied to the whole library. The whole library calibrations for clay performed well ( ME (modelling efficiency) > 0.82; RMSE (root mean squared error) < 10.9%), reflecting the influence of the direct spectral responses of this property in the vis‐ NIR range. Calibrations for OM were reasonably good, especially in view of the very small variation in this property ( ME > 0.60; RMSE < 0.55%). The best results were, however, found when dividing the large library into smaller subsets by using variation in the mean‐normalized or first derivative spectra. This divided the global data set into clusters that were more uniform in mineralogy, regardless of geographical origin, and improved predictive performance. The best clustering method improved the RMSE in the validation to 8.6% clay and 0.47% OM , which corresponds to a 21% and 15% reduction, respectively, as compared with whole library PLSR . For the whole library, SVM performed almost equally well, reducing RMSE to 8.9% clay and 0.48% OM .

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here