
OSCAR and ActivityNet: an Image Captioning model can effectively learn a Video Captioning dataset
Author(s) -
Emmanuel Byrd,
Miguel González-Mendoza
Publication year - 2021
Language(s) - English
Resource type - Conference proceedings
DOI - 10.52591/lxai202106257
Subject(s) - closed captioning , computer science , task (project management) , image (mathematics) , artificial intelligence , multimedia , computer vision , natural language processing , speech recognition , engineering , systems engineering
Activity Recognition and Classification in video sequences is an area of research that has received attention recently. However, video processing is computationally expensive, and its advances have not been as extraordinary compared to those of Image Captioning. This work, created by Latinx individuals from Mexico, uses a computationally limited environment and transforms the Video Captioning dataset of ActivityNet into an Image Captioning. Generating features with Bottom-Up attention and training an OSCAR Image Captioning model, and using different NLP Data Augmentation techniques, we show a viable and promising approach to simplify the Video Captioning task.