Hierarchical Fusion Transformer for Multimodal Ground-based Cloud Type Classification
Author(s) -
Shuang Liu,
Zeyu Yu,
Zhong Zhang,
Chaojun Shi,
Baihua Xiao
Publication year - 2025
Publication title -
ieee journal of selected topics in applied earth observations and remote sensing
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 1.246
H-Index - 88
eISSN - 2151-1535
pISSN - 1939-1404
DOI - 10.1109/jstars.2025.3614756
Subject(s) - geoscience , signal processing and analysis , power, energy and industry applications
Existing methods for multimodal ground-based cloud type classification are dominated by Convolutional Neural Networks (CNNs), and it fails to capture long-range dependencies. In this paper, we propose a novel Transformer-based architecture named Hierarchical Fusion Transformer (HFT) for multimodal ground-based cloud type classification, which leverages the advantages of self-attention and cross-attention to learn long-range dependencies and effectively fuse cloud images and meteorological element information. Specifically, we propose Visual and Meteorological Joint-Transformer (VM Joint-Trans) to capture global context across modalities, and present Visual and Meteorological Cross-Transformer (VM Cross-Trans) to align different modalities and reduce their inconsistencies. We design a hierarchical architecture to perform comprehensive fusion using VM Joint-Trans and VM Cross-Trans. Meanwhile, we propose the novel Multimodal Contrastive Learning (MCL) which not only constrains the tokens of cloud images and meteorological element information in the same layer, but also the tokens from the same modality in different layers, thereby improving the discriminative ability of model and reducing the modality gap. Furthermore, we release the Large-scale Multimodal Ground-based Cloud Database (LMGCD), containing 10,000 multimodal samples with seven categories. To the best of our knowledge, it is the largest database for multimodal ground-based cloud type classification. Experimental results validate the effectiveness of the proposed HFT for multimodal ground-based cloud type classification.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom