Premium
Machine‐learning algorithms predict soil seed bank persistence from easily available traits
Author(s) -
Rosbakh Sergey,
Pichler Maximilian,
Poschlod Peter
Publication year - 2022
Publication title -
applied vegetation science
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.096
H-Index - 64
eISSN - 1654-109X
pISSN - 1402-2001
DOI - 10.1111/avsc.12660
Subject(s) - random forest , trait , persistence (discontinuity) , predictive power , predictive modelling , regression , flexibility (engineering) , linear regression , phylogenetic tree , set (abstract data type) , regression analysis , machine learning , stepwise regression , interpretability , computer science , ecology , mathematics , biology , statistics , engineering , philosophy , biochemistry , geotechnical engineering , epistemology , gene , programming language
Question Soil seed banks (SSB), i.e. pools of viable seeds in the soil and on its surface, play a crucial role in plant biology and ecology. Information on seed persistence in soil is of great importance for fundamental and applied research, yet compiling data sets on this trait still requires enormous efforts. We asked whether the machine‐learning (ML) approach could be used to infer and predict SSB properties of a regional flora based on easily available data. Location Eighteen calcareous grasslands located along an elevational gradient of almost 2000 m in the Bavarian Alps, Germany. Methods We compared a commonly used ML model (random forest) with a conventional model (linear regression model) as to their ability to predict SSB presence/absence and density using empirical data on SSB characteristics (environmental, seed traits and phylogenetic predictors). Further, we identified the most important determinants of seed persistence in soil for predicting qualitative and quantitative SSB characteristics using the ML approach. Results We demonstrated that the ML model predicts SSB characteristics significantly better than the linear regression model. A single set of predictors (either environment, or seed traits, or phylogenetic eigenvectors) was sufficient for the ML model to achieve high performance in predicting SSB characteristics. Importantly, we established that a few widely available SSB predictors can achieve high predictive power in the ML approach, suggesting a high flexibility of the developed approach for use in various study systems. Conclusions Our study provides a novel methodological approach that combines empirical knowledge on the determinants of SSB characteristics with a modern, flexible statistical approach based on ML. It clearly demonstrates that ML can be developed into a key tool to facilitate labor‐intensive, costly and time‐consuming functional trait research.