Data Visualization Resources
Resources from “Data Viz 101: Concepts and Tools” workshop 2015-10-14
Background
Charles Joseph Minard’s Map (1869), from wikimedia. Edward Tufte says it’s “probably the best statistical graphic ever drawn”.
Different areas:
- Data Visualization
- Information Visualization
- Visual Analytics
Books:
- Matthew Ward, Georges G. Grinstein, and Daniel Keim. Interactive data visualization : foundations, techniques, and applications, Second edition (Boca Raton : CRC Press, 2015).
- Colin Ware, Visual thinking for design (Burlington, MA : Morgan Kaufmann, 2008).
- Colin Ware, Information Visualization Perception for Design, 3rd ed (Burlington : Elsevier Science, 2012).
Visual Analytics
“Visualization allows people to offload cognition to the perceptual system, using carefully designed images as a form of external memory. The human visual system is a very high-bandwidth channel to the brain, with a significant amount of processing occurring in parallel and at the pre-conscious level. We can thus use external images as a substitute for keeping track of things inside our own heads.”
Tamara Munzner, “Visualization,” in Fundamentals of Computer Graphics (3rd edition), ed. Peter Shirley, Michael Ashikhmin, and Steve Marschner (Natick, MA: A K Peters, 2009).
“Visual analytics solutions provide technology that combines the strengths of human and electronic data processing. Visualization becomes the medium of a semi-automated analytical process, where humans and machines cooperate using their respective distinct capabilities for the most effective results.”
Daniel Keim, Gennady Andrienko, Jean-Daniel Fekete, Carsten Görg, Jörn Kohlhammer, and Guy Melan, “Visual Analytics: Definition, Process, and Challenges”, in Information Visualization - Human-Centered Issues and Perspectives, LNCS, ed. Andreas Kerren, et al. (Springer, 2008), 154-175. Available at http://hal-lirmm.ccsd.cnrs.fr/lirmm-00272779/document
Shneiderman
Visual Information Seeking Mantra:
- Overview first, zoom and filter, then details-on-demand.
Type by Task Taxonomy (TTT)
- Seven data types: 1-, 2-, 3-dimensional data, temporal and multi-dimensional data, and tree and network data
- Seven tasks: overview, zoom, filter, details-on-demand, relate, history, extract
Ben Shneiderman, “The Eyes Have It: A Task by Data Type Taxonomy for Information Visualizations”, Proceedings of the 1996 IEEE Symposium on Visual Languages (1996): 336. Available at http://www.interactiondesign.us/courses/2011_AD690/PDFs/Shneiderman_1996.pdf
Preattentive Features
Visualizations and Viz tools need to be designed with human visual abilities in mind. For example, check out this research about “Preattentive Features and Tasks” (on youtube):
Christopher G. Healey, “Perception in Visualization”, http://www.csc.ncsu.edu/faculty/healey/PP/
Visualization Examples
The best way to get ideas of good ways to visualize data is to look at lots of examples. Try some of these catalogs:
- The Data Visualisation Catalogue
- Period Table of Visualization Methods
- Grapic Continuum
- TimeViz Browser
- A Visual Bibliography of Tree Visualization 2.0
Negative Examples:
Viz Resources
- Flowingdata
- Infovis Wiki
- visualising data
- Gapminder
- Viz-Palette (helps you choose accessible color palettes)
- The Pudding (visual essays)
Simple Web Based Tools
- Google Charts
- Raw Graphs (based on D3.js)
- Data Wrapper
- Charted
- Vega Voyager (open alternative to Tableau in development)
- Highcharts Cloud (free web editor version of popular commercial js library)
- Flourish (free version with public data)
Libraries
- HTML+JS: D3 (Alternatively, checkout C3.js, dygraphs, TauCharts, chartjs, or Vega)
- Python: matplotlib (you will probably want to use it along with Pandas to manipulate the data. Alternatively, checkout Bokeh, seaborn, plotly, or ggplot)
- R: ggplot (usually used in conjunction with other Tidyverse packages. Alternatively, checkout ggvis, Shiny)
Code Notebooks
Code notebooks such as Jupyter are becoming an essential tool in data science. They combine code blocks with text and visualizations into a shareable document. Many notebooks now have cloud hosted versions to lower barriers to getting started and promote reproducibility:
- Try Jupyter, or share Jupyter notebooks with nbviewer or binder
- Colaboratory (Jupyter environment hosted by Google designed to run TensorFlow without installing anything, with your notebooks stored in Google Drive)
- Observable (JavaScript based data visualization for the web)
- Code Ocean (a variety of notebooks and IDE available)
- COCALC (was SageMath Cloud, now has Jupyter Notebook with lots of kernels, a LaTeX editor, and more)
- Iodide and Pyodide (web focused notebooks implemented in browser, in alpha development)
- Stencila (word processor + spreadsheet + code)
- Azure Notebooks
Tableau
I rarely recommend non-opensource tools, but Tableau is fairly unique tool for visually exploring data in a flexible, powerful sandbox.
It is commonly used in enterprise settings, however, they provide free licenses for academic learning use.
If your data can be shared publicly, Tableau Public is a good option.
Also, checkout their Iron Viz competition for interesting examples.
- Free for students and teachers
- Not free for administration, see higher-ed solutions
Tutorials: