z-logo
open-access-imgOpen Access
Optimal sample size and data arrangement method in estimating correlation matrices with lesser collinearity: A statistical focus in maize breeding
Author(s) -
Tiago Olivoto,
Maicon Nardino,
Ricardo Carvalho Ivan,
Nicolau Follmann Diego,
Ferrari Maurício,
Junior de Pelegrin Alan,
Jardel Szareski Vinícius,
Costa de Oliveira Antônio,
Otomar Caron Braulio,
Queiróz de Souza Velci
Publication year - 2017
Publication title -
african journal of agricultural research
Language(s) - English
Resource type - Journals
ISSN - 1991-637X
DOI - 10.5897/ajar2016.11799
Subject(s) - multicollinearity , collinearity , statistics , mathematics , sample size determination , confidence interval , variance inflation factor , zea mays , correlation , regression analysis , sample (material) , agronomy , biology , geometry , chemistry , chromatography
Information about data arrangement methodologies and optimal sample size in estimating the Pearson correlation coefficient (r) among maize traits are still limited. Furthermore, some data arrangement methodologies currently used may be increasing multicollinearity in multiple regression analysis. This study aimed to investigate the statistical behavior of the r and the multicollinearity of correlation matrices among maize traits in different data arrangement scenarios and different sample sizes. Data from 45 treatments [15 simple maize hybrids (Zea mays L.) conducted in three locations] were used. Eleven traits were accessed and three datasets (scenarios) were formed: (1) Coming from all the sampled observations (plants), n = 900; (2) Coming from the average of five plants per plot, n = 180; and (3) Coming from the average of treatments, n = 45. A thousand estimates of r were held in each scenario to 60 sample sizes by bootstrap simulations with replacement. Confidence intervals (CI) were estimated. One hundred eighty correlation matrices were estimated and the condition number (CN) calculated. Data coming from average values of plots and average values of treatments overestimates the r up to 24 and 34%, resulting in an increase of 24 and 131% in the matrices’ CN. Trait pairs with high r require a smaller number of plants, being the CI inversely proportional to the magnitude of the r. Two hundred and ten plants are sufficient to estimate the r in the CI of 95% < 0.30. Key words: Average values, bootstrap, confidence intervals, sample tracking, Zea mays L.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom