z-logo
open-access-imgOpen Access
A Multi-Modal Approach Using a Hybrid Vision Transformer and Temporal Fusion Transformer Model for Stock Price Movement Classification
Author(s) -
Ibanga Kpereobong Friday,
Sarada Prasanna Pati,
Debahuti Mishra
Publication year - 2025
Publication title -
ieee access
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 0.587
H-Index - 127
eISSN - 2169-3536
DOI - 10.1109/access.2025.3589063
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
Stock market price movement primarily focuses on accurately classifying buy and sell signals, which enables traders to maximize profits with well-timed market entry and exit trading positions. This study presents and implements a multi-modal deep learning approach to classifying stock price movement. Our approach adequately captures potential price reversals or continuations by utilizing two modalities (candlestick chart patterns and historical price data). Specifically, the proposed framework converts the historical data into candlestick charts of 256 × 256 pixel images where both modalities are effectively integrated and processed. A key innovation employed is the application of the histogram of oriented gradients (HOG) to extract relevant descriptors, including the candlestick colour, body-to-wick proportions, and wick size. Concurrently, the vision transformer (ViT) model is used to process the images using an embedded projection and multi-head self-attention to extract salient spatial features into a non-overlapping patch of 16 × 16 pixels, which are treated as input tokens for the model. After which, the temporal fusion transformer (TFT) model processes the historical features, candlestick chart features, and the extracted HOG features via a decision-level (late feature fusion) strategy that concatenates these inputs to predict short-term price movements over different horizons (1 day, 3 days, 7 days, and 10 days ahead). We systematically evaluate the model performance using a time series cross-validation split to demonstrate the proposed model's efficacy and generalization across eight indices (BSE, IXIC, N225, NIFTY-50, NSE-30, NYSE, S&P 500, and SSE). The results demonstrate the superior performance of our multi-modal approach, achieving average accuracy, precision, recall, and matthew correlation coefficient (MCC) of 96.17%, 96.24%, 96.15%, and 0.9367, respectively across all evaluated indices. Furthermore, using a real-time trading simulation, the study assesses the practical implications of different window sizes (5, 10, and 15 days). A paired t-test is also conducted to validate the proposed model against benchmarks statistically. The analysis provides valuable insights into how short and long-term traders can effectively maximize the proposed model, highlighting its adaptability for real-world applications.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom