Why overfitting is not (usually) a problem in partial correlation networks. | Zendy

Donald R. Williams | Zendy; Josue E. Rodriguez | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Why overfitting is not (usually) a problem in partial correlation networks.

Author(s) -

Donald R. Williams,

Josue E. Rodriguez

Publication year - 2022

Publication title -

psychological methods

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 6.981

H-Index - 151

eISSN - 1939-1463

pISSN - 1082-989X

DOI - 10.1037/met0000437

Subject(s) - overfitting , spurious relationship , variance (accounting) , inference , false positive paradox , partial correlation , computer science , contrast (vision) , artificial intelligence , machine learning , correlation , false positives and false negatives , generalizability theory , regularization (linguistics) , econometrics , statistics , mathematics , artificial neural network , geometry , accounting , business

Network psychometrics is undergoing a time of methodological reflection. In part, this was spurred by the revelation that ℓ₁-regularization does not reduce spurious associations in partial correlation networks. In this work, we address another motivation for the widespread use of regularized estimation: the thought that it is needed to mitigate overfitting. We first clarify important aspects of overfitting and the bias-variance tradeoff that are especially relevant for the network literature, where the number of nodes or items in a psychometric scale are not large compared to the number of observations (i.e., a low p/n ratio). This revealed that bias and especially variance are most problematic in p/n ratios rarely encountered. We then introduce a nonregularized method, based on classical hypothesis testing, that fulfills two desiderata: (a) reducing or controlling the false positives rate and (b) quelling concerns of overfitting by providing accurate predictions. These were the primary motivations for initially adopting the graphical lasso (glasso). In several simulation studies, our nonregularized method provided more than competitive predictive performance, and, in many cases, outperformed glasso. It appears to be nonregularized, as opposed to regularized estimation, that best satisfies these desiderata. We then provide insights into using our methodology. Here we discuss the multiple comparisons problem in relation to prediction: stringent alpha levels, resulting in a sparse network, can deteriorate predictive accuracy. We end by emphasizing key advantages of our approach that make it ideal for both inference and prediction in network analysis. (PsycInfo Database Record (c) 2022 APA, all rights reserved).

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research