z-logo
Premium
Comparing multiple clustering approaches to understand proteomic datasets for improved biomarker detection
Author(s) -
Winchester Laura,
Shi Liu,
Kormilitzin Andrey,
NevadoHolgado Alejo J
Publication year - 2020
Publication title -
alzheimer's and dementia
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 6.713
H-Index - 118
eISSN - 1552-5279
pISSN - 1552-5260
DOI - 10.1002/alz.047654
Subject(s) - cluster analysis , computer science , data mining , biomarker discovery , principal component analysis , pattern recognition (psychology) , artificial intelligence , hierarchical clustering , proteomics , consensus clustering , robustness (evolution) , fuzzy clustering , machine learning , cure data clustering algorithm , biology , biochemistry , gene
Background Development of reproducible blood‐based protein biomarker for Alzheimer’s Disease would allow clinicians and researchers to diagnose and potentially treat patients in the early stages of disease. However, differences found between measurement approaches prevent global marker development. By testing clustering and classification approaches we hope to find commonly co‐expressed markers in the data that can used cross platform and patient cohort to predict disease. Method Six proteomics datasets with a sample size greater than 100 containing AD cases and controls were collated. We included datasets generated by either mass spectrometry or SomaScan assay from serum, CSF and brain tissue. Each sample set was normalised and analysed through an identical pipeline of unsupervised clustering and network module approaches. These approaches included partitioning, biclustering, density based and fuzzy clustering methods. Bioinformatics tools for module generation tools were also tested (including WGCNA). Methods were compared by cluster/module number and accuracy. Protein lists generated were compared to assess precision. Result Variability in clustering approach results as well as datasets illustrated the vital importance in quality control and proper normalisation of data. Dimensional reduction was a favoured approach for identification of relevant clustering. In particular Uniform Manifold Approximation and Projection (UMAP) demonstrated its robustness as a tool against the proteomics noise. As well as direct overlap a measure of accuracy was calculated to understand the reliability of the method. We found noisy data was represented by an increased number of predicted clusters with certain algorithms being particularly susceptible such as Principle Component Analysis (PCA) and hierarchical clustering. Conclusion Proteomic data has higher background noise and therefore standard genomic approaches designed for gene expression data are not always suitable. We provide recommendations for informatics tools for successful analysis of proteomic data and demonstrate potential testing methods for use on other datasets.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here