Finding suitable, less space consuming views for a document’s main content is crucial to provide convenient access to large document collections on display devices of different size. We present a novel compact visualization which represents the document’s key semantic as a mixture of images and important key terms, similar to cards in a top trumps game. The key terms are extracted using an advanced text mining approach based on a fully automatic document structure extraction. The images and their captions are extracted using a graphical heuristic and the captions are used for a semi-semantic image weighting. Furthermore, we use the image color histogram for classification and show at least one representative from each non-empty image class. The approach is demonstrated for the IEEE InfoVis publications of a complete year. The method can easily be applied to other publication collections and sets of documents which contain images.
@article{Strobelt2009DocumentCardsTop, author = {H. Strobelt, D. Oelke, C. Rohrdantz, A. Stoffel, D. Keim, O. Deussen}, doi = {10.1109/TVCG.2009.139}, issn = {1077-2626}, journal = {IEEE Transactions on Visualization and Computer Graphics}, keywords = {data mining;data visualisation;document image processing;IEEE InfoVis publications;advanced text mining;compact visualization;display devices;document cards;document structure extraction;image color histogram;semi-semantic image weighting;top trumps document visualization;Displays;Feeds;Histograms;Image databases;Operating systems;Pipelines;Search engines;Text mining;Visualization;content extraction;document collection browsing;document visualization;visual summary}, month = {nov}, number = {6}, pages = {1145--1152}, title = {Document Cards: A Top Trumps Visualization for Documents}, url = {http://graphics.uni-konstanz.de/publikationen/2009/documentcards/website/}, volume = {15}, year = {2009} }