Premium
Topic Tomographies (TopTom): a visual approach to distill information from media streams
Author(s) -
Gobbo B.,
Balsamo D.,
Mauri M.,
Bajardi P.,
Panisson A.,
Ciuccarelli P.
Publication year - 2019
Publication title -
computer graphics forum
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.578
H-Index - 120
eISSN - 1467-8659
pISSN - 0167-7055
DOI - 10.1111/cgf.13714
Subject(s) - computer science , pipeline (software) , dimension (graph theory) , interface (matter) , visualization , identification (biology) , information retrieval , data stream mining , data mining , human–computer interaction , botany , mathematics , bubble , maximum bubble pressure method , parallel computing , biology , pure mathematics , programming language
Abstract In this paper we present Top Tom, a digital platform whose goal is to provide analytical and visual solutions for the exploration of a dynamic corpus of user‐generated messages and media articles, with the aim of i) distilling the information from thousands of documents in a low‐dimensional space of explainable topics, ii) cluster them in a hierarchical fashion while allowing to drill down to details and stories as constituents of the topics, iii) spotting trends and anomalies. Top Tom implements a batch processing pipeline able to run both in near‐real time with time stamped data from streaming sources and on historical data with a temporal dimension in a cold start mode. The resulting output unfolds along three main axes: time, volume and semantic similarity (i.e. topic hierarchical aggregation). To allow the browsing of data in a multiscale fashion and the identification of anomalous behaviors, three visual metaphors were adopted from biological and medical fields to design visualizations, i.e. the flowing of particles in a coherent stream, tomographic cross sectioning and contrast‐like analysis of biological tissues. The platform interface is composed by three main visualizations with coherent and smooth navigation interactions: calendar view, flow view, and temporal cut view. The integration of these three visual models with the multiscale analytic pipeline proposes a novel system for the identification and exploration of topics from unstructured texts. We evaluated the system using a collection of documents about the emerging opioid epidemics in the United States.