Premium
Neighborhood Size of Training Data Influences Soil Map Disaggregation
Author(s) -
Levi Matthew R.
Publication year - 2017
Publication title -
soil science society of america journal
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.836
H-Index - 168
eISSN - 1435-0661
pISSN - 0361-5995
DOI - 10.2136/sssaj2016.08.0258
Subject(s) - covariate , random forest , digital soil mapping , support vector machine , sampling (signal processing) , spatial analysis , raster graphics , raster data , range (aeronautics) , statistics , soil map , mathematics , computer science , environmental science , soil science , artificial intelligence , soil water , materials science , filter (signal processing) , composite material , computer vision
Core Ideas Focal summaries of covariate data around sampling points affect model performance. Support vector machine and random forest approaches produced the best results. A 150‐m neighborhood emerged as the best model, albeit with a general soil map. Multiscale covariate data reflect realistic patterns of soil–landscape features. Soil class mapping relies on the ability of sample locations to represent portions of the landscape with similar soil types; however, most digital soil mapping (DSM) approaches intersect sample locations with one raster pixel per covariate layer regardless of pixel size. This approach does not take the variability of covariate information adjacent to the training data into account. The objective here was to disaggregate a soil map in a semiarid Arizona rangeland (78,569 ha) by exploring different neighborhood sizes for extracting covariate data to points. Eight machine learning algorithms were compared to assess the influence of summarizing covariate data in 0‐, 15‐, 30‐, 60‐, 90‐, 120‐, 150‐, and 180‐m circular neighborhoods and a multiscale model. Κ values of all models ranged between 0.24 and 0.44 and increased with neighborhood size up to 150 m. Support vector machine and random forest algorithms performed best across all scales. The radial support vector machine model using a 150‐m neighborhood had the highest Κ and produced a more generalized map compared with the best multiscale model (random forest), which resulted in a mix of general and detailed soil features. Evaluating a range of neighborhood sizes for aggregating covariate data provides a method of accounting for multiscale processes that are important for predicting soil patterns without modifying the pixel size of the final maps. Incorporating concepts from traditional soil surveys with DSM approaches can strengthen ties between them and optimize the extraction of landscape information for predicting soil properties.