Premium
Analyzing Relatedness by Toponym Co‐ O ccurrences on Web Pages
Author(s) -
Liu Yu,
Wang Fahui,
Kang Chaogui,
Gao Yong,
Lu Yongmei
Publication year - 2014
Publication title -
transactions in gis
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.721
H-Index - 63
eISSN - 1467-9671
pISSN - 1361-1682
DOI - 10.1111/tgis.12023
Subject(s) - linkage (software) , similarity (geometry) , exponent , information retrieval , geography , web page , computer science , distance matrix , data mining , world wide web , biology , artificial intelligence , algorithm , image (mathematics) , biochemistry , linguistics , philosophy , gene
Abstract This research proposes a method for capturing “relatedness between geographical entities” based on the co‐occurrences of their names on web pages. The basic assumption is that a higher count of co‐occurrences of two geographical places implies a stronger relatedness between them. The spatial structure of C hina at the provincial level is explored from the co‐occurrences of two provincial units in one document, extracted by a web information retrieval engine. Analysis on the co‐occurrences and topological distances between all pairs of provinces indicates that: (1) spatially close provinces generally have similar co‐occurrence patterns; (2) the frequency of co‐occurrences exhibits a power law distance decay effect with the exponent of 0.2; and (3) the co‐occurrence matrix can be used to capture the similarity/linkage between neighboring provinces and fed into a regionalization method to examine the spatial organization of C hina. The proposed method provides a promising approach to extracting valuable geographical information from massive web pages.