Open Access
Optimal compression of traffic flow data
Author(s) -
Igor Grabec
Publication year - 2010
Publication title -
metodološki zvezki
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.127
H-Index - 7
eISSN - 1854-0031
pISSN - 1854-0023
DOI - 10.51936/bbwz9279
Subject(s) - algorithm , computer science , data compression , data compression ratio , data set , redundancy (engineering) , entropy (arrow of time) , probability density function , gaussian , flow network , compression (physics) , data mining , statistics , mathematics , mathematical optimization , artificial intelligence , image compression , materials science , composite material , physics , quantum mechanics , image (mathematics) , image processing , operating system
Experimental characterization of complex physical laws by probability density function of measured data is treated. For this purpose we introduce a statistical Gaussian mixture model comprised of representative data and probabilities related to them. To develop an algorithm for adaptation of representative data to measured ones we introduce the model cost function by the sum of discrepancy and redundancy. All statistics are expressed by the information entropy. An iterative method is proposed for searching the minimum of the cost function that yields an optimal model. Since representative data are generally less numerous than measured ones, the proposed method is applicable for compression of overwhelming experimental data measured by automatic data-acquisition systems. Such a compression is demonstrated on the characterization of traffic flow rate on the Slovenian roads network. The flow rate during a particular day at an observation point is described by a vector comprised of 24 components. The set of 365 vectors measured in one year is optimally compressed to just 4 representative vectors and related probabilities. These vectors represent the flow rate in normal working days and weekends or holidays, while the related probabilities correspond to the relative frequencies of these days. However, the number of representative data depends on the accuracy of PDF estimation.