DATC-STP: Towards Accurate yet Efficient Spatiotemporal Prediction with Transformer-style CNN | Zendy

Hyeonseok Jin | Zendy; Kyungbaek Kim | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

DATC-STP: Towards Accurate yet Efficient Spatiotemporal Prediction with Transformer-style CNN

Author(s) -

Hyeonseok Jin,

Kyungbaek Kim

Publication year - 2025

Publication title -

ieee access

Language(s) - English

Resource type - Magazines

SCImago Journal Rank - 0.587

H-Index - 127

eISSN - 2169-3536

DOI - 10.1109/access.2025.3573639

Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation

Recently, convolutional neural networks (CNNs) or vision transformers (ViTs) based Multi-In-Multi-Out (MIMO) architectures are proposed to overcome the limitations of recurrent neural networks (RNNs) based Single-In-Single-Out (SISO) architectures. These architectures prevent the inherent limitations of RNNs, which degrade performance and inefficiency of parallelization due to the sequential properties. However, there are still some challenges. CNN-based MIMO architectures have difficulty capturing global spatiotemporal information due to the local properties of its kernel. Meanwhile, ViT-based MIMO architectures have difficulty capturing local spatiotemporal information and require high-computational resource due to the self-attention. To improve MIMO architecture with overcome these limitations, we propose a novel accurate yet efficient Dual-Attention Transformer-style CNN for Spatiotemporal Prediction (DATC-STP).DATC-STP captures both local and global spatiotemporal information by 3D patch embedding and Transformer-style CNN. Specifically, 3D patch embedding extract local spatiotemporal features and reduce the size of input data including temporal, height, and width. Two Transformer-style CNN based attention blocks treat spatiotemporal data similarly with image and capture global information with CNNs. These structure makes DATC-STP accurate yet efficient. To demonstrate the effectiveness of DATC-STP, we conduct comprehensive experiments with three promising benchmark datasets, MovingMNIST, TaxiBJ, and KTH.We evaluated that the proposed DATC-STP achieves both competitive performance and efficient. Furthermore, results of ablation study demonstrates the useful for each component of DATC-STP and highlights the potential of proposed methods.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore