Sunday, November 14, 2010

Data Visualization: Show Me Forest, Not The Trees

Too much light often blinds gentlemen of this sort. They cannot see the forest for the trees.
- Christoph Martin Wieland, German poet and writer (1733-1813)

Internet made all of us to be inundated with more information than we can all possibly consume. There are traditional news media reporting news, blogs and other opinion pieces produced by individual users. And as more people spend their time with Facebook and Twitter, more social media contents are being created by our friends and families. We are all being overloaded with more information than we can handle.

This trend of information overload doesn't seem to be slowing down. If you look at Facebook alone, number of users have skyrocketed in past 24 months. As more users are signing up on Facebook, friend connection increases and that means an average user will have more status updates from his/her friends on the News Feed.

There are largely three ways to cope with these growing corpus of data.

1. Unplug: Ignore all the data

2. Filter the data

3. Summarize the data

Each of three represents interesting philosophy behind it, which is worthy of separate blog discussion. Today, I want to focus on #3: Summarizing the data.

Dealing with large data is not a new phenomenon in business. For example, credit scoring companies like Experian, TransUnion, and Equifax, have been collecting mountain of data on how we are paying phone bills to how many credit cards that we opened for years and on. They've created data warehouses of all these data points, and mine them constantly to see new patterns that can signal greater risk in collecting the future debt payment.

Such business intelligence (BI) and analytics have been around for years, and it is $8 billion market and growing.

The core idea behind BI is to show the big picture. When we look at large amount of data closely, it's impossible to see the big picture to see the trends and interesting features in the data. BI solve that problem by summarizing the information and presenting them often in visual form.

These BI techniques are now making its ways into social media data. Following are such examples that I recently encountered on the web:


Core idea of Opinionspace is reducing complex multi-dimensional opinion clusters down to easy-to-render two dimensional space. What is cool about Opinionspace is that you can slide your opinion slider and see how different clusters are related to your own opinion. Try it for yourself at

It is led by Ken Goldberg at UC Berkeley.


As I discussed earlier blog, Twitter has become the internet breaking news channel. To make sense out of all cacophony of thousands of tweets per second, you'll need a tool like Trendsmap.

What's cool about Trendsmap is the fact that you can drill down from the real-time trending topics by geography, and see what people are talking about. You can think of it as browse-and-zoom paradigm of thumbnail picture view, and drilling down to a thumbnail to show the bigger picture.

IBM BigSheets

Video Is Long; Start Watching From 11 Min 30 Sec Into The Video

IBM has a history of providing business intelligence tools. Idea of IBM BigSheets is to filter all real-time tweets by keywords and actionable condition (this spills into #2, filtering the data), and visualizing the data in interesting way (#3 summarizing the data). What's interesting here is the visual way IBM BigSheets presents British Parliament voting records (demonstrated starting from 11:30 into the video). Who thought looking at government data can be this intuitive?

New York Times Heat Map

Source: New York Times

Even if you understand what 2011 US Federal Budget looks like, you will be surprised to see this visual representation of Obama's 2011 Budget proposal. The NYTimes heatmap not only shows you the proposed 2011 budget, but also shows the changes from 2010 budget, and how little room US Federal government has to balance the budget. One key to note is how the subtle transition animation helps viewer understand the changes between 2010 and 2011 budget.

Lesson from these examples is three folds:

1. In order to deal with large amount of data we need to abstract and look at the larger picture before we look at individual data points.

2. Visualizing the data helps people process the data faster because we are using our right hemisphere (parallel processor) to process the visual information than left hemisphere (serial processor) of our brain.

3. Animation is a great way to layer additional information on top of visual data, such as elapse of time (NYTimes Heat Map) or comparing different datasets (Opinionspace).

If you haven't thought about visualizing your data, now is the time. Unless you want to miss the forest while looking at each tree, that is.

No comments:

Post a Comment