Premium
Comparative Exploration of Document Collections: a Visual Analytics Approach
Author(s) -
Oelke D.,
Strobelt H.,
Rohrdantz C.,
Gurevych I.,
Deussen O.
Publication year - 2014
Publication title -
computer graphics forum
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.578
H-Index - 120
eISSN - 1467-8659
pISSN - 0167-7055
DOI - 10.1111/cgf.12376
Subject(s) - computer science , visual analytics , heuristics , visualization , information retrieval , information visualization , optimal distinctiveness theory , set (abstract data type) , topic model , data science , discriminative model , probabilistic logic , analytics , class (philosophy) , artificial intelligence , psychology , psychotherapist , programming language , operating system
We present an analysis and visualization method for computing what distinguishes a given document collection from others. We determine topics that discriminate a subset of collections from the remaining ones by applying probabilistic topic modeling and subsequently approximating the two relevant criteria distinctiveness and characteristicness algorithmically through a set of heuristics. Furthermore, we suggest a novel visualization method called DiTop‐View, in which topics are represented by glyphs (topic coins) that are arranged on a 2D plane. Topic coins are designed to encode all information necessary for performing comparative analyses such as the class membership of a topic, its most probable terms and the discriminative relations. We evaluate our topic analysis using statistical measures and a small user experiment and present an expert case study with researchers from political sciences analyzing two real‐world datasets.