
Analysis of Fine-grained Counting Methods for Masked Face Counting: A Comparative Study
Author(s) -
Khanh-Duy Nguyen,
Huy H. Nguyen,
Trung-Nghia Le,
Junichi Yamagishi,
Isao Echizen
Publication year - 2024
Publication title -
ieee access
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.587
H-Index - 127
ISSN - 2169-3536
DOI - 10.1109/access.2024.3367593
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
Masked face counting is the counting of faces at various crowd densities and discriminating between masked and unmasked faces, which is generally considered to be an object (i.e., face) detection task. Counting accuracy is limited, especially at higher densities, when the faces are relatively small, unclear, and viewed at various angles. Furthermore, it is costly to create the ground-truth bounding boxes needed to train object detection methods. We formulate masked face detection as a fine-grained crowd-counting task, which is appropriate for tackling this challenging task when used with density map regression. However, adopting fine-grained crowd-counting methods for masked face counting is not trivial. It is necessary to identify strategies appropriate for both counting and multi-class classification. We contrasted the strategies of various approaches and examined their benefits and drawbacks. These strategies include (1) simple regression with mixed regression and detection for counting, (2) using class-aware density maps with semantic segmentation maps and class probabilities for classification, and (3) counting with or without depth information enhancement. Analysis of seven crowd-counting methods on three datasets with a total of about 900k annotations demonstrated that the level of congestion affects how well simple regression and mixed regression and detection work for counting. Meanwhile, the most effective approach for classification is using semantic segmentation maps. Evaluation of the usefulness of using depth data demonstrated the need for a depth map to achieve accurate counting. These findings should be useful for future studies.