Estimating Invisible Passenger Count Using CCTV Footage: An Approach Combining Object Detection Models and Machine Learning | Zendy

Kyung-Hee Kim | Zendy; Tae-Ki Ahn | Zendy; Sunhee Kim | Zendy

Open Access

Estimating Invisible Passenger Count Using CCTV Footage: An Approach Combining Object Detection Models and Machine Learning

Author(s) -

Kyung-Hee Kim,

Tae-Ki Ahn,

Sunhee Kim

Publication year - 2025

Publication title -

ieee access

Language(s) - English

Resource type - Magazines

SCImago Journal Rank - 0.587

H-Index - 127

eISSN - 2169-3536

DOI - 10.1109/access.2025.3597708

Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation

Accurate passenger counting is essential for managing congestion in railway vehicles. Although onboard CCTV footage can be used for this purpose, limited camera views often cause occlusion, making some passengers invisible. Thus, even with precise detection of visible individuals using object detection algorithms, estimating the total count remains a challenge. To address this, we propose a two-stage approach. In the first stage, visible passengers are detected using models such as YOLOv8-L, Faster Region-Based Convolutional Neural Network (Faster R-CNN), and Single Shot Detector variants (SSD-VGG16, SSD-ResNet50). In the second stage, machine learning models—including Random Forest, Gradient Boosting, Support Vector Regression (SVR), and eXtreme Gradient Boosting (XGBoost)—are used to predict total passenger numbers. Features based on spatial distribution and object size, extracted via region-wise segmentation, are used to train prediction models. These are evaluated using Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and Coefficient of Determination (R²). First-stage performance is assessed with metrics including Frames Per Second (FPS), Precision, Recall, F1-Score, and Intersection over Union (IoU). Experimental results show that combining YOLO-based detection with Random Forest or XGBoost achieves the best performance. Using a 4×4 region division, models reached over 96% accuracy. Moreover, the second-stage algorithm improved the detection rate from 52% to 96%. These findings suggest that the proposed method enhances congestion monitoring and can support more efficient railway operations.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research