Towards textually describing complex video contents with audio-visual concept classifiers | Zendy

Chun Chet Tan | Zendy; YuGang Jiang | Zendy; ChongWah Ngo | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Towards textually describing complex video contents with audio-visual concept classifiers

Author(s) -

Chun Chet Tan,

YuGang Jiang,

ChongWah Ngo

Publication year - 2011

Publication title -

proceedings of the 30th acm international conference on multimedia

Language(s) - English

Resource type - Conference proceedings

DOI - 10.1145/2072298.2072411

Subject(s) - computer science , audio visual , variety (cybernetics) , audio analyzer , artificial intelligence , multimedia , natural language processing , audio signal processing , speech recognition , audio signal , speech coding

Automatically generating compact textual descriptions of complex video contents has wide applications. With the recent advancements in automatic audio-visual content recognition, in this paper we explore the technical feasibility of the challenging issue of precisely recounting video contents. Based on cutting-edge automatic recognition techniques, we start from classifying a variety of visual and audio concepts in video contents. According to the classification results, we apply simple rule-based methods to generate textual descriptions of video contents. Results are evaluated by conducting carefully designed user studies. We find that the state-of-the-art visual and audio concept classification, although far from perfect, is able to provide very useful clues indicating what is happening in the videos. Most users involved in the evaluation confirmed the informativeness of our machine-generated descriptions.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research