
Curating Automatic Vehicle Location Data to Compare the Performance of Outlier Filtering Methods
Author(s) -
Jijo K. Mathew,
Christopher M. Day,
Howell Li,
Darcy M. Bullock
Publication year - 2021
Language(s) - English
Resource type - Reports
DOI - 10.5703/1288284317435
Subject(s) - computer science , outlier , data set , identifier , data mining , global positioning system , set (abstract data type) , identification (biology) , data collection , data quality , database , service (business) , artificial intelligence , statistics , computer network , telecommunications , botany , mathematics , economy , economics , biology , programming language
Agencies use a variety of technologies and data providers to obtain travel time information. The best quality data can be obtained from second-by-second tracking of vehicles, but that data presents many challenges in terms of privacy, storage requirements and analysis. More frequently agencies collect or purchase segment travel time based upon some type of matching of vehicles between two spatially distributed points. Typical methods for that data collection involve license plate re-identification, Bluetooth, Wi-Fi, or some type of rolling DSRC identifier. One of the challenges in each of these sampling techniques is to employ filtering techniques to remove outliers associated with trip chaining, but not remove important features in the data associated with incidents or traffic congestion. This paper describes a curated data set that was developed from high-fidelity GPS trajectory data. The curated data contained 31,621 vehicle observations spanning 42 days; 2550 observations had travel times greater than 3 minutes more than normal. From this baseline data set, outliers were determined using GPS waypoints to determine if the vehicle left the route. Two performance measures were identified for evaluating three outlier-filtering algorithms by the proportion of true samples rejected and proportion of outliers correctly identified. The effectiveness of the three methods over 10-minute sampling windows was also evaluated. The curated data set has been archived in a digital repository and is available online for others to test outlier-filtering algorithms.