z-logo
Premium
Phylogenetic networks: a new form of multivariate data summary for data mining and exploratory data analysis
Author(s) -
Morrison David A.
Publication year - 2014
Publication title -
wiley interdisciplinary reviews: data mining and knowledge discovery
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.506
H-Index - 47
eISSN - 1942-4795
pISSN - 1942-4787
DOI - 10.1002/widm.1130
Subject(s) - multivariate statistics , data mining , computer science , exploratory data analysis , cluster analysis , visualization , field (mathematics) , data visualization , phylogenetic tree , biological data , principal component analysis , graph , multivariate analysis , metric (unit) , data science , theoretical computer science , machine learning , artificial intelligence , bioinformatics , mathematics , biology , gene , pure mathematics , operations management , economics , biochemistry
Exploratory data analysis ( EDA ) involving both graphical displays and numerical summaries of data, is intended to evaluate the characteristics of the data as well as providing a form of data mining. For multivariate data, the best‐known visual summaries include discriminant analysis, ordination, and clustering, particularly metric ordinations such as principal components analysis. However, these techniques have limiting mathematical assumptions that are not always realistic. Recently, network techniques have been developed in the biological field of phylogenetics that address some of these limitations. They are now widely used in biology under the name phylogenetic networks, but they are actually of general applicability to any multivariate dataset. Phylogenetic networks are fast and relatively easy to calculate, which makes them ideal as a tool for EDA . This review provides an overview of the field, with particular reference to the use of what are called splits graphs. There are several types of splits graph, which summarize the multivariate data in different ways. Example analyses are presented based on the neighbor‐net graph, which seems to be the most generally useful of the available algorithms. This should encourage the more widespread use of these networks whenever a summary of a multivariate dataset is required. This article is categorized under: Algorithmic Development > Biological Data Mining Application Areas > Data Mining Software Tools Technologies > Structure Discovery and Clustering Technologies > Visualization

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here