Leveraging Graph Neural Networks and Multimodal Fusion for Enhanced Movie Genre Classification with Plots and Reviews
Author(s) -
Mubarak Alrashoud,
Faheem Shaukat,
Zeeshan Ashraf
Publication year - 2025
Publication title -
ieee access
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 0.587
H-Index - 127
eISSN - 2169-3536
DOI - 10.1109/access.2025.3620217
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
Movie genre classification is essential for organizing cinematic content, enhancing recommendation systems, and supporting market analysis. Unimodal approaches relying solely on plot summaries often fail to capture experiential insights from user reviews, limiting their ability to model genre-specific nuances. Multimodality, which integrates diverse data sources such as textual narratives and user-generated feedback, addresses this by leveraging complementary information for richer representations. This paper introduces the Multimodal Graph Attention Network (MGAN), a novel framework that integrates Graph Neural Networks (GNNs) with a late fusion attention mechanism to model relational structures in plot summaries and user reviews. Utilizing DeBERTa (Decoding-enhanced BERT with disentangled attention) for plot embeddings, leveraging its strength in capturing deep contextual nuances in narrative texts, and MiniLM (Miniature Language Model) for review embeddings, benefiting from its computational efficiency and effectiveness in processing short, sentiment-rich content, MGAN leverages the Trailers12K and Large Movie Review Dataset (LMRD) to achieve a micro-average precision (μAP) of 87.50%, surpassing state-of-the-art models like DeBERTa (80.07%) and the Genre Attention Model (83.63%). MGAN notably achieves a Hamming Loss of 0.069, indicating high prediction fidelity in multi-label classification tasks. MGAN offers significant improvements in both performance and interpretability, supported by SHAP (SHapley Additive exPlanations) explainable AI. Its design strikes a balance between relational learning and computational efficiency, making it a robust solution for scalable, text-based multimedia genre classification in real-world applications.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom