Premium
A classification technique for local multivariate clusters and outliers of spatial association
Author(s) -
Oxoli Daniele,
Sabri Soheil,
Rajabifard Abbas,
Brovelli Maria A.
Publication year - 2020
Publication title -
transactions in gis
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.721
H-Index - 63
eISSN - 1467-9671
pISSN - 1361-1682
DOI - 10.1111/tgis.12639
Subject(s) - outlier , multivariate statistics , geospatial analysis , data mining , computer science , univariate , spatial analysis , cluster analysis , context (archaeology) , set (abstract data type) , exploratory data analysis , data set , raw data , multivariate analysis , artificial intelligence , geography , statistics , cartography , mathematics , machine learning , archaeology , programming language
The detection of spatial clusters and outliers is critical to a number of spatial data analysis techniques. Many techniques embed spatial clustering components with the aim of exploring spatial variability and patterns in a data set, caused by the spatial association that generally affects most spatial data. A frontier challenge in spatial data analysis is to extend techniques—originally designed for univariate analysis—to a multivariate context, in order to be able to cope with the increasing complexity and variety of modern spatial data. This article proposes an exploratory procedure to detect and classify clusters and outliers in a multivariate spatial data set. Cluster and outlier detection relies on recently introduced multivariate extensions of the well‐established local indicators of spatial association statistics. Two new indicators are proposed enabling the classification of multivariate clusters and outliers, not directly achievable with any already established technique. The procedure is fully implemented using free and open source geospatial software and libraries. The raw source code is made available for future reviews and replications. Empirical results from early applications on both synthetic and real spatial data are discussed. Advantages and limitations of the introduced procedure are outlined according to the empirical results.