Premium
Mapping Large Spatial Flow Data with Hierarchical Clustering
Author(s) -
Zhu Xi,
Guo Diansheng
Publication year - 2014
Publication title -
transactions in gis
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.721
H-Index - 63
eISSN - 1467-9671
pISSN - 1361-1682
DOI - 10.1111/tgis.12100
Subject(s) - computer science , flow (mathematics) , cluster analysis , hierarchical clustering , data mining , data set , aggregate (composite) , set (abstract data type) , scalability , similarity (geometry) , spatial analysis , geography , artificial intelligence , mathematics , remote sensing , image (mathematics) , materials science , geometry , database , composite material , programming language
It is challenging to map large spatial flow data due to the problem of occlusion and cluttered display, where hundreds of thousands of flows overlap and intersect each other. Existing flow mapping approaches often aggregate flows using predetermined high‐level geographic units (e.g. states) or bundling partial flow lines that are close in space, both of which cause a significant loss or distortion of information and may miss major patterns. In this research, we developed a flow clustering method that extracts clusters of similar flows to avoid the cluttering problem, reveal abstracted flow patterns, and meanwhile preserves data resolution as much as possible. Specifically, our method extends the traditional hierarchical clustering method to aggregate and map large flow data. The new method considers both origins and destinations in determining the similarity of two flows, which ensures that a flow cluster represents flows from similar origins to similar destinations and thus minimizes information loss during aggregation. With the spatial index and search algorithm, the new method is scalable to large flow data sets. As a hierarchical method, it generalizes flows to different hierarchical levels and has the potential to support multi‐resolution flow mapping. Different distance definitions can be incorporated to adapt to uneven spatial distribution of flows and detect flow clusters of different densities. To assess the quality and fidelity of flow clusters and flow maps, we carry out a case study to analyze a data set of 243,850 taxi trips within an urban area.