Premium
Combining deep features and activity context to improve recognition of activities of workers in groups
Author(s) -
Luo Xiaochun,
Li Heng,
Yu Yantao,
Zhou Cheng,
Cao Dongping
Publication year - 2020
Publication title -
computer‐aided civil and infrastructure engineering
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 2.773
H-Index - 82
eISSN - 1467-8667
pISSN - 1093-9687
DOI - 10.1111/mice.12538
Subject(s) - activity recognition , discriminative model , computer science , conditional random field , artificial intelligence , context (archaeology) , deep learning , pattern recognition (psychology) , relevance (law) , machine learning , spatial contextual awareness , geography , archaeology , political science , law
Automatic activity recognition plays an important role in addressing the efficiency issue of site management. In recent years, there has been an increasing interest in vision‐based activity recognition, while its relatively low recognition accuracy and speed impede the practical application. This paper introduces a discriminative model to combine deep activity features and contextual information to improve the recognition of activities of workers on foot in site surveillance videos. Specifically, a conditional random field (CRF) model is designed based on deep activity features, which are extracted with a single‐stream deep activity recognition network, and spatial relevance, which are obtained with a tracking‐by‐detection multiple‐object tracking method. We have evaluated various deep activity features, including action features, activity features, and joint features. Also, we have parameterized the contextual information of activities in terms of spatial relevance and represent the context with graphs of K ‐nearest neighbors. The experimental results show that the CRF model based on deep activity features and activity context can significantly improve activity recognition performance to 98.77% average accuracy by 22.10% from the baseline 77.67%, which is obtained using the single‐stream deep activity recognition network, with a small computational overhead of 0.025 ms per segment.