z-logo
open-access-imgOpen Access
Speech data analysis for semantic indexing of video of simulated medical crises.
Author(s) -
Shuangshuang Jiang
Publication year - 2015
Language(s) - English
Resource type - Dissertations/theses
DOI - 10.18297/etd/2070
Subject(s) - computer science , preprocessor , segmentation , search engine indexing , task (project management) , identification (biology) , session (web analytics) , speech recognition , component (thermodynamics) , debriefing , process (computing) , artificial intelligence , multimedia , natural language processing , statistics , botany , physics , mathematics , management , world wide web , economics , biology , operating system , thermodynamics
: The Simulation for Pediatric Assessment, Resuscitation, and Communication (SPARC) group within the Department of Pediatrics at the University of Louisville, was established to enhance the care of children by using simulation based educational methodologies to improve patient safety and strengthen clinician-patient interactions. After each simulation session, the physician must manually review and annotate the recordings and then debrief the trainees. The physician responsible for the simulation has recorded 100s of videos, and is seeking solutions that can automate the process. This dissertation introduces our developed system for ecient segmentation and semantic indexing of videos of medical simulations using machine learning methods. It provides the physician with automated tools to review important sections of the simulation by identifying who spoke, when and what was his/her emotion. Only audio information is extracted and analyzed because the quality of the image recording is low and the visual environment is static for most parts. Our proposed system includes four main components: preprocessing, speaker segmentation, speaker identification, and emotion recognition. The preprocessing consists of first extracting the audio component from the video recording. Then, extracting various low-level audio features to detect and remove silence segments. We investigate and compare two different approaches for this task. The first one is threshold-based and the second one is classification-based. The second main component of the proposed system consists of detecting speaker changing points for the purpose of segmenting the audiostream. We propose two fusion methods for this task.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom