Open Access
Crash data reporting systems in fourteen Arab countries: challenges and improvement
Author(s) -
Zahira Abounoas,
Wassim Raphaël,
Yarob Badr,
Rafic Faddoul,
Anne Guillaume
Publication year - 2020
Publication title -
archives of transport
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.309
H-Index - 14
eISSN - 2300-8830
pISSN - 0866-9546
DOI - 10.5604/01.3001.0014.5628
Subject(s) - crash , decision tree , computer science , data collection , c4.5 algorithm , naive bayes classifier , pruning , random forest , descriptive statistics , data mining , tracing , traceability , machine learning , statistics , mathematics , support vector machine , biology , agronomy , programming language , operating system , software engineering
Traffic crash fatalities and serious injuries still represent a big burden for most Arab countries because the actual policies, strategies, and interventions are based on poorly collected data. Through this paper, we assessed the crash data reporting systems in Fourteen Arab countries via a survey conducted to identify the fundamental dysfunctions at the management and data collection levels. Then, to address some of the dataset problems, we had applied data mining technics to select a minimum of variables (crash, vehicle, and road user) that should be collected for a better understanding of crash circumstances. For this raison, three methods of selection (correlation, information gain, and gain ratio) and seven classifiers (naive Bayes, nearest neighbour, random forest, random tree, J48, reduced error pruning tree, and bagging) were tested and compared to identify the variables that affect significantly the crashes severity. Decision trees family of classifiers showed the best performance based on the analysis of the area under the curve. The explanatory variables obtained from the data mining process were combined with other descriptive variables to maintain traceability. As a result, we produced hybrid lists of variables for the crash, vehicle, and road user, each contains 25 variables. Finally, in order to propose a cost-effective solution to switch from manual to electronic data collection, we got inspired by a tool used to track animals to create and customize a unified e-form for handheld devices, in order to ensure easy entering of the harmonized data for the entire region based on our selected lists of variables. The tool verified the countries requirements especially by enabling data collection and transfer with and without the internet, and by allowing data analysis thought its built-in Geographic Information System (GIS) capabilities.