z-logo
open-access-imgOpen Access
RAM-VO: A Recurrent Attentional Model for Visual Odometry
Author(s) -
Iury Cleveston,
Esther Luna Colombini
Publication year - 2021
Language(s) - English
Resource type - Conference proceedings
DOI - 10.5753/wtdr_ctdr.2021.18684
Subject(s) - visual odometry , computer science , artificial intelligence , reinforcement learning , initialization , context (archaeology) , convolutional neural network , optical flow , monocular , process (computing) , odometry , representation (politics) , computer vision , pattern recognition (psychology) , machine learning , image (mathematics) , robot , mobile robot , paleontology , biology , programming language , operating system , politics , political science , law
Determining the agent's pose is fundamental for developing autonomous vehicles. Visual Odometry (VO) algorithms estimate the egomotion using only visual differences from the input frames. The most recent VO methods implement deep-learning techniques using convolutional neural networks (CNN) widely, adding a high cost to process large images. Also, more data does not imply a better prediction, and the network may have to filter out useless information. In this context, we incrementally formulate a lightweight model called RAM-VO to perform visual odometry regressions using large monocular images. Our model is extended from the Recurrent Attention Model (RAM), which has emerged as a unique architecture that implements a hard attentional mechanism guided by reinforcement learning to select the essential input information. Our methodology modifies the RAM and improves the visual and temporal representation of information, generating the intermediary RAM-R and RAM-RC architectures. Also, we include the optical flow as contextual information for initializing the RL agent and implement the Proximal Policy Optimization (PPO) algorithm to learn a robust policy. The experimental results indicate that RAM-VO can perform regressions with six degrees of freedom using approximately 3 million parameters. Additionally, experiments on the KITTI dataset confirm that RAM-VO produces competitive results using only 5.7% of the input image.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here