From audio to information: Learning topics from audio transcripts | Zendy

João Pedro Rodrigues | Zendy; Emerson Cabrera Paraíso | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

From audio to information: Learning topics from audio transcripts

Author(s) -

João Pedro Rodrigues,

Emerson Cabrera Paraíso

Publication year - 2020

Language(s) - English

Resource type - Conference proceedings

DOI - 10.5753/kdmile.2020.11967

Subject(s) - computer science , latent dirichlet allocation , audio signal processing , topic model , multimedia , coherence (philosophical gambling strategy) , range (aeronautics) , artificial intelligence , speech recognition , speech coding , natural language processing , audio signal , materials science , composite material , physics , quantum mechanics

In this work, the technical feasibility of working with audio transcriptions from Youtube is analyzed, as well as presenting a method that allows data acquisition, pre-processing, and post-processing to work with this type of data. A topic modeling approach with the latent dirichlet allocation algorithm is used. An approach is also presented to dynamically determine the ideal number of topics that make up a given corpus. In the experiments, a database of 250 audio transcriptions was used, obtaining a model with coherence in the range of 40%.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research