z-logo
open-access-imgOpen Access
Hierarchical Canonical Correlation Analysis Reveals Phenotype, Genotype, and Geoclimate Associations in Plants
Author(s) -
Raphael Petegrosso,
Tianci Song,
Rui Kuang
Publication year - 2020
Publication title -
plant phenomics
Language(s) - English
Resource type - Journals
eISSN - 2097-0374
pISSN - 2643-6515
DOI - 10.34133/2020/1969142
Subject(s) - canonical correlation , arabidopsis thaliana , computational biology , correlation , arabidopsis , biology , phenotype , genome , feature (linguistics) , genetics , environmental data , association (psychology) , adaptation (eye) , gene , computer science , mathematics , artificial intelligence , ecology , linguistics , philosophy , geometry , epistemology , mutant , neuroscience
The local environment of the geographical origin of plants shaped their genetic variations through environmental adaptation. While the characteristics of the local environment correlate with the genotypes and other genomic features of the plants, they can also be indicative of genotype-phenotype associations providing additional information relevant to environmental dependence. In this study, we investigate how the geoclimatic features from the geographical origin of the Arabidopsis thaliana accessions can be integrated with genomic features for phenotype prediction and association analysis using advanced canonical correlation analysis (CCA). In particular, we propose a novel method called hierarchical canonical correlation analysis (HCCA) to combine mutations, gene expressions, and DNA methylations with geoclimatic features for informative coprojections of the features. HCCA uses a condition number of the cross-covariance between pairs of datasets to infer a hierarchical structure for applying CCA to combine the data. In the experiments on Arabidopsis thaliana data from 1001 Genomes and 1001 Epigenomes projects and climatic, atmospheric, and soil environmental variables combined by CLIMtools, HCCA provided a joint representation of the genomic data and geoclimate data for better prediction of the special flowering time at 10°C (FT10) of Arabidopsis thaliana . We also extended HCCA with information from a protein-protein interaction (PPI) network to guide the feature learning by imposing network modules onto the genomic features, which are shown to be useful for identifying genes with more coherent functions correlated with the geoclimatic features. The findings in this study suggest that environmental data comprise an important component in plant phenotype analysis. HCCA is a useful data integration technique for phenotype prediction, and a better understanding of the interactions between gene functions and environment as more useful functional information is introduced by coprojections of multiple genomic datasets.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom