z-logo
open-access-imgOpen Access
DATC-STP: Towards Accurate yet Efficient Spatiotemporal Prediction with Transformer-style CNN
Author(s) -
Hyeonseok Jin,
Kyungbaek Kim
Publication year - 2025
Publication title -
ieee access
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 0.587
H-Index - 127
eISSN - 2169-3536
DOI - 10.1109/access.2025.3573639
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
Recently, convolutional neural networks (CNNs) or vision transformers (ViTs) based Multi-In-Multi-Out (MIMO) architectures are proposed to overcome the limitations of recurrent neural networks (RNNs) based Single-In-Single-Out (SISO) architectures. These architectures prevent the inherent limitations of RNNs, which degrade performance and inefficiency of parallelization due to the sequential properties. However, there are still some challenges. CNN-based MIMO architectures have difficulty capturing global spatiotemporal information due to the local properties of its kernel. Meanwhile, ViT-based MIMO architectures have difficulty capturing local spatiotemporal information and require high-computational resource due to the self-attention. To improve MIMO architecture with overcome these limitations, we propose a novel accurate yet efficient Dual-Attention Transformer-style CNN for Spatiotemporal Prediction (DATC-STP).DATC-STP captures both local and global spatiotemporal information by 3D patch embedding and Transformer-style CNN. Specifically, 3D patch embedding extract local spatiotemporal features and reduce the size of input data including temporal, height, and width. Two Transformer-style CNN based attention blocks treat spatiotemporal data similarly with image and capture global information with CNNs. These structure makes DATC-STP accurate yet efficient. To demonstrate the effectiveness of DATC-STP, we conduct comprehensive experiments with three promising benchmark datasets, MovingMNIST, TaxiBJ, and KTH.We evaluated that the proposed DATC-STP achieves both competitive performance and efficient. Furthermore, results of ablation study demonstrates the useful for each component of DATC-STP and highlights the potential of proposed methods.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here