Visual Computing

University of Konstanz
Computer Graphics Forum

Comparative Exploration of Document Collections: A Visual Analytics Approach

D. Oelke, H. Strobelt, C. Rohrdantz, I. Gurevych, O. Deussen

Abstract

We present an analysis and visualization method for computing what distinguishes a given document collection from others. We determine topics that discriminate a subset of collections from the remaining ones by applying probabilistic topic modeling and subsequently approximating the two relevant criteria distinctiveness and characteristicness algorithmically through a set of heuristics. Furthermore, we suggest a novel visualization method called DiTop-View, in which topics are represented by glyphs (topic coins) that are arranged on a 2D plane. Topic coins are designed to encode all information necessary for performing comparative analyses such as the class membership of a topic, its most probable terms and the discriminative relations. We evaluate our topic analysis using statistical measures and a small user experiment and present an expert case study with researchers from political sciences analyzing two real-world datasets.

BibTeX

@article{Oelke2014ComparativeExplorationDocument,
  acmid      = {2771516},
  address    = {Chichester, UK},
  author     = {D. Oelke and H. Strobelt and C. Rohrdantz and I. Gurevych and O. Deussen},
  doi        = {10.1111/cgf.12376},
  issn       = {0167-7055},
  issue_date = {June 2014},
  journal    = {Computer Graphics Forum},
  keywords   = {Categories and Subject Descriptors according to ACM CCS:, H.5.m [Information Systems]: Information Interfaces and Presentation-Miscellaneous},
  month      = {jun},
  number     = {3},
  numpages   = {10},
  pages      = {201--210},
  publisher  = {The Eurographs Association \&\#38; John Wiley \&\#38; Sons, Ltd.},
  title      = {Comparative Exploration of Document Collections: A Visual Analytics Approach},
  volume     = {33},
  year       = {2014},
}

Supplemental Material

Honourable Mention (.pdf, 433.3 KB) Example (.pdf, 969.2 KB) User Study (.pdf, 3.7 MB) Paper (.pdf, 2.3 MB)