
Cross-view Correspondence Modeling for Joint Representation Learning between Egocentric and Exocentric Videos
Author(s) -
Zhehao Zhu,
Yoichi Sato
Publication year - 2025
Publication title -
ieee access
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 0.587
H-Index - 127
eISSN - 2169-3536
DOI - 10.1109/access.2025.3593474
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
Joint analysis of human action videos from egocentric and exocentric views enables a more comprehensive understanding of human behavior. While previous works leverage paired videos to align clip-level features across views, they often ignore the complex spatial and temporal misalignments inherent in such data. In this work, we propose a Cross-View Transformer that explicitly models fine-grained spatiotemporal correspondence between egocentric and exocentric videos. Our model incorporates self-attention to enhance intra-view context and cross-view attention to align features across space and time. To train the model, we introduce a hybrid loss function combining a triplet loss and a domain classification loss, further reinforced by a sample screening mechanism that emphasizes informative training pairs.We evaluate our method on multiple egocentric action recognition benchmarks, including Charades-Ego and EPIC-Kitchens. Experimental results demonstrate that our method consistently outperforms existing approaches, achieving state-of-the-art performance on several egocentric video understanding tasks.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom