Hierarchical Discriminative Classification for Text-Based Geolocation
Author(s) -
Benjamin Wing,
Jason Baldridge
Publication year - 2014
Language(s) - English
Resource type - Conference proceedings
DOI - 10.3115/v1/d14-1039
Subject(s) - geolocation , discriminative model , computer science , hierarchy , feature selection , feature (linguistics) , information retrieval , artificial intelligence , logistic regression , data mining , machine learning , world wide web , linguistics , philosophy , economics , market economy
Text-based document geolocation is commonly rooted in language-based information retrieval techniques over geodesic grids. These methods ignore the natural hierarchy of cells in such grids and fall afoul of independence assumptions. We demonstrate the effectiveness of using logistic regression models on a hierarchy of nodes in the grid, which improves upon the state of the art accuracy by several percent and reduces mean error distances by hundreds of kilometers on data from Twitter, Wikipedia, and Flickr. We also show that logistic regression performs feature selection effectively, assigning high weights to geocentric terms.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom