z-logo
open-access-imgOpen Access
Causal-Aware Multimodal Transformer for Supply Chain Demand Forecasting: Integrating Text, Time Series, and Satellite Imagery
Author(s) -
Ying Wang,
Guanyu Ding,
Ziyang Zeng,
Shiyu Yang
Publication year - 2025
Publication title -
ieee access
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 0.587
H-Index - 127
eISSN - 2169-3536
DOI - 10.1109/access.2025.3619552
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
Supply chain demand forecasting faces unprecedented challenges due to the complex interplay of multiple information sources, including market dynamics reflected in textual data, historical consumption patterns, and macroeconomic indicators captured through satellite imagery. Traditional forecasting methods primarily rely on historical time series data, failing to leverage the rich contextual information available from heterogeneous data modalities. This paper proposes a novel Causal-Aware Multimodal Transformer (CAMT) framework that systematically integrates textual data from news and social media, time series of historical demand, and satellite imagery reflecting economic activities for enhanced demand forecasting accuracy. Our approach introduces three key innovations: (1) a cross-modal attention mechanism that enables effective fusion of text-temporal-visual representations through learnable inter-modal relationships, (2) a causal discovery-based feature importance evaluation method that identifies genuine causal relationships between different modalities and demand patterns while mitigating spurious correlations, and (3) a hierarchical multi-scale prediction framework that provides coherent forecasts across product, category, and regional levels. The framework is built upon a Vision-Language Transformer architecture with specialized encoders for each modality and a unified decoder for demand prediction. Extensive experiments on the M5 Forecasting Competition dataset, augmented with news articles, social media posts, and corresponding satellite imagery, demonstrate that CAMT achieves significant improvements over state-of-the-art baselines, with 12.3% reduction in Root Mean Square Error (RMSE) and 15.7% improvement in Mean Absolute Percentage Error (MAPE) compared to the best performing baseline. Ablation studies confirm the effectiveness of each proposed component, while causal analysis reveals that satellite-derived economic indicators contribute most significantly to long-term forecasting accuracy, whereas textual sentiment shows stronger predictive power for short-term demand fluctuations. The proposed framework offers practical implications for supply chain management by enabling more accurate demand forecasting that considers diverse external factors, ultimately supporting better inventory management and operational planning decisions.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom