Numero: a statistical framework to define multivariable subgroups in complex population-based datasets | Zendy

Song Gao | Zendy; Stefan Mutter | Zendy; Aaron Casey | Zendy; VillePetteri Mäkinen | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Numero: a statistical framework to define multivariable subgroups in complex population-based datasets

Author(s) -

Song Gao,

Stefan Mutter,

Aaron Casey,

VillePetteri Mäkinen

Publication year - 2018

Publication title -

international journal of epidemiology

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 3.406

H-Index - 208

eISSN - 1464-3685

pISSN - 0300-5771

DOI - 10.1093/ije/dyy113

Subject(s) - cluster analysis , computer science , population , data mining , hierarchical clustering , machine learning , data science , medicine , environmental health

Large-scale epidemiological and population data provide opportunities to identify subgroups of people who are at risk of disease or exposed to adverse environments. Clustering algorithms are popular data-driven tools to identify these subgroups; however, relying exclusively on algorithms may not produce the best results if the dataset does not have a clustered structure. For this reason, we propose a framework (the R-library Numero) that combines the self-organizing map algorithm, permutation analysis for statistical evidence and a final expert-driven subgrouping step. We used Numero to define subgroups in two examples without an obvious clustering structure: a biomedical dataset of kidney disease and another dataset of community-level socioeconomic indicators. We benchmarked the Numero subgroupings against popular clustering algorithms (principal components, K-means and hierarchical clustering). The Numero subgroupings were more intuitive and easier to interpret without losing mathematical quality. Therefore, we expect Numero to be useful for exploratory analyses of population-based epidemiological datasets.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research