Open AccessHierarchical Aligned Multimodal Learning for NER on Tweet PostsOpen Access
Author(s)
Peipei Liu,
Hong Li,
Yimo Ren,
Jie Liu,
Shuaizong Si,
Hongsong Zhu,
Limin Sun
Publication year2024
Mining structured knowledge from tweets using named entity recognition (NER)can be beneficial for many down stream applications such as recommendation andintention understanding. With tweet posts tending to be multimodal, multimodalnamed entity recognition (MNER) has attracted more attention. In this paper, wepropose a novel approach, which can dynamically align the image and textsequence and achieve the multi-level cross-modal learning to augment textualword representation for MNER improvement. To be specific, our framework can besplit into three main stages: the first stage focuses on intra-modalityrepresentation learning to derive the implicit global and local knowledge ofeach modality, the second evaluates the relevance between the text and itsaccompanying image and integrates different grained visual information based onthe relevance, the third enforces semantic refinement via iterative cross-modalinteractions and co-attention. We conduct experiments on two open datasets, andthe results and detailed analysis demonstrate the advantage of our model.
Language(s)English
Seeing content that should not be on Zendy? Contact us.
To access your conversation history and unlimited prompts, please
Prompt 0/10