Nelson Davis, Matt Chambers and Alex Duke have formed the Reviz Project which you can read more about here. Their first challenge was to visualize the relationship between gun homicides and gun suicides in the United States. Data is sourced from the CDC.
I took a quick pass at visualizing the ratio between suicides and homicides for each state and over time using a scatterplot (I’m a scatterplot junkie) and a timeline. Hover over each state circle in the scatterplot to filter the timeline below to show the trend for a chosen state, and use the slider at the bottom of the timeline to explore the relationship between these two variables for a particular year in the scatterplot:
The dashboards they created to visualize this same data are quite elaborate (you can find them on their blog). While this month’s data story is very sobering, it’s always fascinating to me how different people, starting with the same raw data, and – in the case of Nelson, Matt, Alex and I – the exact same tool (Tableau), can come up with very different results, and very different insights.
That’s why data visualization has such a strong social component to it.
Thanks for reading,
Q: In data visualization, is there a single “best” way to visualize data in a particular scenario and for a particular audience, or are there multiple “good enough” ways?
- In summary, Few says “Is there a best solution in a given situation? You bet there is.”
- In contrast, Cole says “For me, though, it is possible to have multiple varying visuals that may be equally effective”
Could Both be Right?
This is going to sound strange, but I think both are right, and there is room for both approaches in the field of data visualization. Let me explain.
Lucky for us, really smart people have been studying how to choose between a variety of alternatives for over a century now. Decision-making of this sort is the realm of Operations Research (also called “operational research”, “management science” and “decision science”). Another way of asking the lead-in question is:
Q: When choosing how to show data to a particular audience, should I keep looking until I find a single optimum solution, or should I stop as soon as I find one of many that achieves some minimum level of acceptability (also called the “acceptability threshold” or “aspiration level”)?
The former approach is called optimization, and the latter was given the name “satisficing” (a combination of the words satisfy and suffice) by Nobel laureate Herbert A. Simon in 1956.
So which approach should we take? Should we Optimize or Satisfice when visualizing data?
I believe there is room for both approaches. Which approach we take depends on three factors:
- Whether or not the decision problem is tractable
- Whether or not all of the information is available
- Whether or not we have time and resources to get the necessary information
But What is the “Payoff Function” for Data Visualizations?
This is a critical question, and where I think some of the debate stems. Part of the challenge in ranking alternative solutions to a data visualization problem is determining what variables go into the payoff function, and their relative weight or importance. The payoff function is how we compare alternatives. Which choice is better? Why is it better? How much better?
Few says that “we can judge the merits of a data visualization by its ability to make the information as easy to understand as possible.” By stating this, he seems to me to be proposing a particular payoff function: increased comprehensibility = increased payoff.
But is comprehensibility the only variable that matters (did our audience accurately and precisely understand the relative proportions?) or should other variables be factored in as well, such as attention (did our audience take notice?), impact (did they care?), aesthetics (did they find the visuals appealing?), memorability (did they remember the medium and/or the message some time into the future?) and behavior (did they take some desired action as a result?).
Here’s a visual that shows how I tend to think about measuring payoff, or success, of a particular solution with hypothetical scores (and yes, I’ve been accused of over-thinking things many times before):
It’s pretty easy to conceive of situations, and I’d venture to say that most of us experienced this first-hand, where a particular visualization type may have afforded increased precision of comparison, but that extra precision wasn’t necessary for the task at hand, and the visualization was inferior in some other respect that doomed our efforts to failure. Comprehensibility may be the single most important factor in data visualization, but I don’t agree that it’s the only factor we could potentially be concerned with. Not every data visualization scenario requires ultimate precision, just as engineers don’t specify the same tight tolerances for a $15 scooter as they do for a $450M space shuttle. Also, visualization types can make one type of comparison easier (say, part-to-whole) but another comparison more difficult (say, part-to-part).
What seems clear, then, is that if we want to optimize for all of these variables (and likely others) for our particular scenario and audience, then we’ll need to do a lot of work, and it will take a lot of time. If the audience is narrowly defined (say, the board of directors of a specific non profit organization), then we simply can’t test all of the variables (such as behavior – what will they do?) ahead of time. We have to forge ahead with imperfect information, and use something called bounded rationality – the idea that decision-making involves inherent limitations in our knowledge, and we’ll have to pick something that is ‘good enough’.
And if we get the data at 9:30am and the meeting is at 4pm on the same day? Running a battery of tests often isn’t practical.
But what if we feel that optimization is critical in a particular case? We can start by simplifying things for ourselves, focusing on just one or two input variables, making some key assumptions about who our audience will be, what their state of mind will be when we present to them, and how their reactions will be similar to or different from the reactions of a test audience. We reduce the degrees of freedom and optimize a much simpler equation. I’m all for knowing which chart types are more comprehensible than others. In a pinch, this is really good information to have at our disposal.
There’s Room for Both Approaches
Simon noted in his Nobel laureate speech that “decision makers can satisfice either by finding optimum solutions for a simplified world, or by finding satisfactory solutions for a more realistic world. Neither approach, in general, dominates the other, and both have continued to co-exist in the world of management science.”
I believe both should co-exist in the world of data visualization, too. We’ll all be better off if people continue to test and find optimum visualizations for simplified and controlled scenarios in the lab, and we’ll be better off if people continue to forge ahead and create ‘good enough’ visualizations in the real world, taking into account a broader set of criteria and embracing the unknowns and messy uncertainties of communicating with other thinking and feeling human minds.
Thanks for reading my $0.02. I’d like to hear your thoughts.
The following comments are in response to Stephen Few’s recent newsletter entitled “Information Visualization Research as Pseudo-Science” in which he critiqued an academic paper by Borkin et al entitled “Beyond Memorability: Visualization Recognition and Recall“. I’m not an academic researcher, so I will leave it to others in the field to respond to Few’s specific criticisms of the paper’s methods. My goal in this article is to respond to opinions Few voiced about memorability in data visualization.
I’d like to start by asking a few questions:
- Does it matter whether a data visualization is memorable or not?
- Should we, as data visualization practitioners, care about memorability?
- Should we design our visualizations so that those who view them are more likely to remember them at a later point in time?
- Is memorability a worthwhile area of study for those studying data visualization in academia?
In my opinion, and in my experience, the answer to each of these questions is ‘Yes’.
In Stephen Few’s recent newsletter entitled “Information Visualization Research as Pseudo-Science”, though, he put forward a differing opinion:
“Visualizations don’t need to be designed for memorability— they need to be designed for comprehension. For most visualizations, the comprehension that they provide need only last until the decision that it informs is made. Usually, that is only a matter of seconds.” – Stephen Few (emphasis his)
This statement helped me understand why Few and I disagree about memorability: we disagree about how data visualizations are used by groups of people. Simply put, I don’t believe data visualizations are “usually” followed by decisions “only a matter of seconds” later. That may be how a robot or a computer algorithm would approach decision-making, but it’s just not how groups of humans in organizations go about it.
How do groups of humans usually work with data visualizations, then? Well, analysts prepare dense packets for pre-reading materials, directors and VPs attend review meetings where they look at lots and LOTS of data and charts, sometimes they take copious notes, sometimes they zone out and check their smart phones, then they break for lunch, check their email, reconvene and consider different topics, only to have the final decision made at a totally different planning meeting or off-site weeks later.
Sound familiar? That’s a whole lot messier than question -> visualization -> decision in seconds. And that’s only one reason why memorability matters.
In my experience, the memorability of the overall message (of which the visualizations are a critical element) matters most when:
- Decisions won’t be made immediately
- The audience doesn’t care deeply about the topic
- The environment is already saturated in data and visualizations
To illustrate these three conditions, let me relate a personal story from my experience working with data and groups of decision makers. The specific details of the account have been altered to protect the innocent.
A Practitioner Wins Thanks to Memorability
One time I had the unenviable task of presenting the results of the launch of a product that was, shall we say, less than “top-of-mind” to the executives at a Fortune 500 company. Think “razor” of the razor – razor blade model. Sales should just be a pull-through, so they didn’t pay much attention to it at all.
But what we were finding was that the relative neglect of this high-touch product was causing a lot of dissatisfaction, and our lack of attention to the details of the product offering was causing us to lose customers.
In preparing for the presentation, I created plenty of nice, Tufte-compliant charts and graphs, like this one (a generalized mock up), to show how the recently-launched product was doing in the marketplace:
Do you notice the problem in the chart? That’s right, we didn’t launch a green SKU in Configuration B.
Why not? Tooling investment.
Who cares? Customers did. A lot of them. The nature of the product was such that customers couldn’t select between A & B. There were factors that pre-determined that for them.
Now I was scheduled to be the fifth presenter in a very long review meeting where many other topics would be discussed, and as I mentioned, this product just didn’t matter to the executives. My charts were going to get glossed over. If the executives gave me 10 seconds of attention on each chart, I’d have considered myself lucky. The way the situation was shaping up, I felt pretty sure that this product line’s issues weren’t going to be addressed as a result of my presentation.
So instead, I showed charts like this, with actual photographs of actual customers and their actual quotes:
The result was palpable.
They leaned in. They looked at the faces in the pictures. Actual customers. People that looked like their sons, their daughters, their mothers. They chuckled at the funny social media handles. They cared. For the first time in a long time, they actually cared about the razor. And they cared about the fact that customers just weren’t loving it.
A few weeks later, I received an email that the go-ahead had been given to resolve a number of problems with this product line, including the missing green SKU in Configuration B. The VP thanked me for showing the “human side” of the data in my presentation.
When the time came to make the decision, they opted to fund a product they didn’t used to care about, thanks to charts they couldn’t forget.
Memorable or Comprehensible, or Both?
Stephen Few made the statement that comprehensibility matters, but memorability doesn’t when it comes to designing data visualizations. Well the original charts in my real-life example above were definitely comprehensible. I changed them because they weren’t particularly memorable.
My original charts were in the bottom right quadrant of the 4-blocker below, and all I did was push them up to the top-right. Sure, sometimes, it’s not necessary to do so. Sometimes, though, it’s make-or-break:
Note that for scenarios where the audience members already deeply care about the data, comprehensibility itself will result in memorability. Adding photos of beautiful, smiling faces just isn’t necessary.
But let’s be honest. Having an audience of 100% of the key decision makers that wait with bated breath for our next bland chart that results in a blank check being given right there on the spot just isn’t normal. It would be nice, sure, but how many times have you actually been in that situation? So many times you absolutely need them to remember your message. Having charts that draw them in and stay in their brains just isn’t a bad idea.
Sometimes There’s Just No Decision
So far I’ve written about data visualizations in the context of human decision-making. But many data visualizations don’t inform decisions at all. Decision support is but one of many possible purposes. Data visualizations can be created to merely inform, to educate, and yes, even to entertain. In those cases, design for memorability can be the difference between having someone share your work with others, and having them forget they ever saw it.
Few made the following comment about adding images to visualizations:
If I incorporate an image of a kitten into a data visualization, I can guarantee that a test subject would remember seeing that kitten if it is shown to her again a few minutes later. But how is that useful? Unless the visualization’s message is that kittens are cute and fun, nothing of consequence has been achieved. – Stephen Few
He answers the question himself quite well: images are useful if the visualization’s message is enhanced by the presence of the images.
Take my Edgar Allen Poe timeline for example:
Does the image of Poe add any value at all? How about the image of his signature? Are these components nothing more than “chartjunk” (Few mentioned to me in an email that he would not call the image of Poe “chartjunk” based on his 2011 writings on the subject), or do they actually perform a function?
I submit that they perform a vital function. The visualization shows the life works of one man as blocks stacked together in the years they were written. Works that were written on ink and paper by his own hand.
There’s no decision here. The visualization is simply intended to educate you. And it’s my opinion that your education takes on a whole different meaning – a whole different feeling – when you see Poe’s face and an artifact of his own penmanship.
And let’s be honest, the following version is pretty damn boring, you’d probable ignore it if you saw it in your twitter feed, and it’s not nearly as memorable, is it?
I’d like to conclude by quoting from Stephen Few’s critique one final time:
The greatest tragedy of this research is that what makes a visualization memorable is actually of no consequence. – Stephen Few
I hope I’ve made it clear in this blog post why I think that memorability can actually be of great consequence in data visualization. But did you notice that in my comments above I used phrases like “in my experience” quite often, and that all I really did was relate an anecdote and state my opinion? My opinion does not amount to codified knowledge, and my experiences do not amount to rigorous research.
And that’s exactly why I would appreciate further attempts by academics to study what makes charts more or less memorable. I’m sure this task isn’t easy. Visualizations are but one piece of an overall message that can be delivered in myriad ways to a variety of audiences. For those who are studying this topic, do know that there are practitioners out there who are hoping that the insight you glean into this topic can help us all.
It seems there’s a day for everything, right. National Cashew Day (you guessed it, that’s today), World Philosophy Day (nope, sorry – last Thursday). Heck, you can even register your own National Day of [fill in the blank].
So, to earn my Math Nerd Card for 2015, I created the following dashboard that visualizes the first 100 numbers in the Fibonacci Sequence, starting with 1 instead of 0, as I’m led to understand is the more modern convention:
Math + Data Nerds Unite: know of any other good math vizzes out there? Leave a comment!
Thanks for humoring me,
Have you noticed it as well? The tide is turning against dogmatism in data visualization, as witnessed by the increasing number of voices speaking out against a rigid approach and closed-mindedness regarding practices that are often ridiculed in knee-jerk fashion. It’s about time.
To which voices am I referring, and which data visualization practices are they defending?
1. Charts with non-zero y-axes
Can you tell Vox’s Johnny Harris and Matthew Yglesias have had it with readers taking pot shots about their choice of y-axis starting points? Their well-reasoned video entitled “Shut up about the y-axis. It shouldn’t always start at zero” says it all:
Harris and Yglesias show that the choice of axis starting point depends on the context, the unit of measure, and on the comparison being made.
2. Artistic approaches to data visualization
In Andy Cotgreave’s recent ComputerWorld article “Why Do We Visualise Data?“, Andy argues that not all data visualizations need to be burdened with the requirement of imparting ultimate precision of comparison. It depends on the purpose.
“The purpose of a visualization will also determine the extent to which you should inform effectively…Sometimes it’s more important to make someone engage with the overall message rather than the minutiae.”
Andy uses the example of Stephanie Prosavec’s “Air Transformed“: a wearable data visualization necklace showing air quality in Sheffield, UK:
Should this project really be ridiculed as “ineffective” and horribly contrary to “best practices”, with no place for it at all in the field of data visualization? Or could it be that this form of expression engages human beings in a way that a rigorous report complete with bar charts and compliant zero y-axis timelines of air quality in Sheffield could hardly do? Maybe both the report and the necklace are valid, each in their own way.
Which should you create? It depends on your purpose and your audience.
3. Pie charts
Oh, the poor, maligned pie chart. The chart type that gets pushed around and bullied on the data viz playground more than any other. Randal Olsen of /r/dataisbeautiful ran a twitter poll asking “Do you think pie charts should be banned from #dataviz?”. Scientific or not, nearly 2 in 5 responded affirmatively:
Another Twitter poll: Do you think pie charts should be banned from #dataviz?
— Randy Olson (@randal_olson) October 29, 2015
That’s amazing if you stop and think about it. Almost 40% of respondents, likely mostly data viz enthusiasts who follow Olsen, think that pie charts should never, ever, ever be used. Hilariously, Andy Kirk of Visualising Data asked whether we should also run a poll about whether those people should be banned, and Irvin Almonte’s response was sheer genius:
— Randy Olson (@randal_olson) November 5, 2015
Again: It depends. Say it with me: IT DEPENDS.
Pie charts, in certain instances, can actually be more effective than bar charts at showing specific part-to-whole comparisons. And if the part-to-whole relationship is far more important to your message than comparing uber-accurately between categories, and if there are a very small number of slices, go ahead, give thought to using a pie chart. Don’t be intimidated by pie chart haters. There, I said it.
4. Word Clouds
I entered the fray last week with my blog post “My 3 Basic Tenets of Data Visualization” in which I argued that rules of thumb, not black-and-white rules, should prevail, along with a spirit of humility and openness to exploration and innovation in data viz.
I also did the unthinkable: I defended the Word Cloud. The poor, lowly pal of the pie chart, united on the playground in mutual fear of the roving data-dogma bully. My point is that if you only had a very short amount of time to impress upon a large room of people the most commonly used passwords, which of these four visualization types would you choose?
The word cloud sacrifices precision for completeness (all of the passwords actually appear on the screen in only the word cloud) and readability (the most commonly used passwords almost shout out at the reader). Is that a reasonable trade-off to make? Maybe. It depends.
Since that blog post, “What are your most used words on Facebook” has gone viral, and we’ve been inundated with over 16 million word clouds as of the writing of this blog post. Of course one can only hope this app is not a gigantic phishing scam, but do you think a bar chart version of your most commonly used words, or a concise and thorough text analytics report would have also gone viral? Maybe, but probably not. And I’m not saying “going viral” is a end that justifies all means, but in this case, ultimate precision of comparison is probably not needed anyway. It actually worked a little too well.
Let’s not let the pendulum swing too far, though.
This casting-off of suffocating restraint and a fearful spirit of ridicule is a REALLY GOOD THING in data visualization, but let’s not let the pendulum swing too far the other way. It’s true that pie charts are very often the wrong choice, and the majority of the time a y-axis that starts at zero is a really good choice.
I’m not a big fan of the word “best” in “best practices”, which seems to promise some optimal solution, but I do like the sentiment in this response by Vance Fitzgerald to my question on twitter about rules of thumb in data visualization:
— Vance Fitzgerald (@vancefitzgerald) November 11, 2015
I’m hopeful that the next phase of data visualization is one that embraces the gray of “it depends” and encourages open dialogue and constructive criticism. In order to get there, we’ll definitely have to shed dogma. Let’s absolutely do so, but let’s also carry forward the principles and rules of thumb that just make good sense, while being open to the possibility that breaking those rules might be a great idea in specific situations.
Wouldn’t this be a more mature approach? Wouldn’t it also be more welcoming, and more enjoyable?
Thanks for reading,
I believe in the creative power of play in any discipline. Data visualization is no different. I’ve acquired new skills and grown in ways I would have never predicted, all because I spent a little bit of time playing with a data set I found intriguing. Here are a three recent side projects of mine, and a useful tip that comes from each.
1. Play specific YouTube video segments when users click on corresponding marks
I’m pretty sure most of you were aware that it was (the real) “Back to the Future Day” this past October 21st. To celebrate the occasion, as any data-loving 80s kid would want to do, I created a dashboard using what I call dimension line charts to show all 13 trips the DeLorean time machine took over the course of the trilogy.
The dashboard gives the reader the ablity to watch the movie clip associated with each DeLorean trip by clicking on the corresponding arrows. To add this feature, I first had to find a video on YouTube that includes each time travel clip back-to-back. Then, I had to grab the embed URL for the video:
Finally, I had to add the embed URL with the following parameters to each row in the spreadsheet I created:
Where XXX and YYY are the start and stop times in seconds, respectively. The “&autoplay=1″ parameter means the user doesn’t have to click the arrow and then click play. Clicking the arrow automatically starts the video clip.
Great Scott, that’s a cool tip!
2. Use the TODAY() function to have a countdown clock in your dashboard update every day
The genesis for this next project came from eyeo. I had the chance to present at the eyeo conference last year, and the day of my presentation happened to be both a Friday the 13th and a full moon. Spooky, right? I wondered how often these two events coincided, so I found a calendar listing of each one, and figured out all the instances in which they occurred or will occur on the exact same day:
The helpful tip that comes from this side project is the use of the function TODAY(). To create the “Days Till Next” table on the far right hand side of the dashboard, I first created the following three calculated fields, the first to pull in the date value for “today”, the second to compute the number of days between today and each given event, and the third to null out events that have already happened in the past:
What’s great about this technique is that every day you load the dashboard, Tableau Public server will update all of these calculated fields, including the day that TODAY() maps to, and give you a brand new countdown. Who says Tableau Public dashboards don’t update automatically?
3. Invert the y-axis to stack marks downward — if you dare!
The third side-project was undertaken on the 166th anniversary of Edgar Allen Poe’s mysterious death. It was born out of a simple question: How many works of literature did the revered writer and poet produce over the course of his life? The following dashboard stacks 150 red boxes, one beneath the other, over the course of two and a half decades to visualize his prolific career:
WARNING: This is a dangerous technique! Just ask Christine Chan, who’s “Gun Deaths in Florida” viz, which used a similar technique, drew the widespread and harsh ire of the internet. People called her graphic “misleading” and “deceptive” for making something that was getting worse look to many like it was getting better.
Fair enough. I feel that in this case, the stacked boxes in the Poe viz isn’t as likely to be misinterpreted as a line chart which slopes “downward” from one value to a greater value. We’re just stacking boxes here, and it’s pretty clear that 1845 has “more” boxes than 1844, not less. By why do it at all? Why risk misleading with the inverted y-axis? It’s for dramatic effect – it makes it appear like blood is falling down onto Poe’s head. Dramatic effect is a double-edged sword, though, so tread with caution.
And to open the Pandora’s Box on this technique, simply right click on the axis you want to reverse (it can be the x-axis or the y-axis), and check the “Reversed” box:
Thanks, I hope you enjoy these side projects as much as I did. I hope you find the tips helpful (and that you don’t burn yourself on them!). What side projects have you learned from lately?
I had the pleasure of reading Cole Nussbaumer Knaflic’s recently released book Storytelling with Data over the past week. I highly recommend it to anyone who uses charts and graphs to convey a data-driven message to an audience – that is to say, basically everyone.
In brief: Cole shows what clear and well-designed visualizations look like, and explains why they’re effective. She also gives sound advice on practices to avoid in most cases, such as pie charts, 3D views and dual axes. She stops a good exit short of Dogmaville, though, explaining that you should be able to give a good explanation why you’re using a challenging chart type if you decide to go that route.
Well beyond merely choosing chart types, the value of this book is that you will learn how to eliminate clutter, focus attention on what matters, and de-emphasize everything else. The pages are filled with high quality before and after images that bring the subject to full color and show you what “good” looks like.
She doesn’t deal with “How” from a tools perspective – her techniques and principles can be put into practice using pretty much any tool from Excel to Tableau to D3. And she doesn’t talk about “data dashboards” that are characterized by multiple charts and graphs placed side-by-side. The book deals entirely with individual visualizations and how they can be designed, annotated, and shown in sequence to tell a story and build to a coherent conclusion.
I especially enjoyed chapters 7 and 8, where Cole gleans lessons from theater, cinema and fiction and then shows how they can be applied to crafting a story with data, including determining flow and storyboarding. Chapter 8 concludes with a great example that I have attempted to recreate here using the Tableau Story Points feature. Note that this interactive graphic is entirely adapted from Cole’s work, not mine, and if you want to understand the principles behind this specific example, I encourage you to read the book:
Excerpted with permission of the publisher, Wiley, from Storytelling with Data: A Data Visualization Guide for Business Professionals by Cole Nussbaumer Knaflic. Copyright © 2015 by Cole Nussbaumer Knaflic. All rights reserved. This book is available at all booksellers.
The book concludes with case studies that deal with special topics like survey data, animating visualizations, slopegraphs, alternatives to pies, and strategies to “avoid the spaghetti graph.” In each of these cases, what I really appreciated about the book is that Cole doesn’t merely provide platitudes, she shows with clear visuals what works well and what doesn’t work as well. The book reminded me of Naomi Robbin’s Creating More Effective Graphs in that regard.
If you’ve also read the book, leave a comment below and tell us what you think. I’ve added this book to my Recommended Books page. Note that Cole also has a blog that you can follow if you find her instruction helpful.
I’m a huge fan of the Pew Research Center. They consistently publish interesting statistical insights about society in well-designed visual form.
Recently I saw this chart tweeted by Conrad Hackett that Pew published about a year ago showing which news outlets Americans of different political ideologies prefer as their main source of news about government and politics:
Pluses and Deltas
Pluses: It’s a clean design and easy to see the relative proportions for each particular ideology. There is no clutter in this chart. Obviously it’s an interesting data story that is very relevant to the news in the United States today.
Deltas: I believe there’s an opportunity to improve the ease with which a reader can track the popularity of the various news outlets across the different ideology group columns. Right now it’s tough to see, for example, how the popularity of CNN, or Fox News compares across the ideological groups. You have to scan the bars and read each label to find them. To make this comparison easier, we can use color to link the outlets across the columns, and we can arrange the rows so as to make the relative popularity of particular outlets immediately apparent. Here are three different chart redesigns:
First, we can simply add color to the bars, a hover action to highlight the bars, and a URL action to open the outlet source when a reader clicks on the bars:
Second, we can use the rows differently, so that instead of arranging them by rank, we give each outlet it’s own row. This has the added benefit of showing us how many outlets there are in total, and which outlets aren’t in the top 5 for each group:
Lastly, we can keep the table format and switch the primary encoding from bar length to cell saturation:
I’m curious to know what you think, so here is an informal, unscientific poll of readers of this site:
This is the 3rd in a blog post series called “Avoiding Data Pitfalls” (1st, 2nd) that I’ll conclude with this blog post, as I’m turning the rest of the content into a book that I hope to publish next year.
There’s a pitfall that’s quite easy to fall into when creating dashboards with multiple charts and graphs: using color in ways that confuse people. There are many ways to confuse with color, including the oft-maligned red-green encoding that color blind viewers can’t decipher. That’s just one of many, though, and I’d like to use this blog post to illustrate three additional versions of the “confusing color” pitfall that I see people like myself fall into quite often.
Then, I’ll wrap it up by talking about the design goal that I aspire to achieve any time I create a dashboard with multiple views.
Color Pitfall #1. Using the same color hue for two different variables
The example for the first type of this common pitfall comes from a New York City Marathon dashboard (click the “Participation Trends” option at the top) that was created using Qlik. Qlik is a competitor to the company I work for, Tableau, but let me be clear that the product Qlik makes isn’t to blame for the mistake, here. I see this error made with any and every data dashboard product, including Tableau.
Without further ado, here is the portion of the dashboard that I propose could be improved upon by using a different color scheme:
Following my tenet to provide critique with humility, here are some pluses (things I like) and deltas (things I would change) about this dashboard:
Pluses: I like the use of color in the histogram that shows the clear cut-off points, where finishers cross the line in droves immediately before the turning of the hour, especially the 4th hour of the race. This shows how goal-setting can affect the performance of a population, and it’s fascinating.
Deltas: Notice that the same green hue, though, applies to the increase in number of finishers from Italy, Netherlands, people who finished between 4 and 5 hours, and Switzerland. Likewise, the same yellow is used to encode the increase in the number of finishers from Germany, Mexico, and people who finished between 5 and 6 hours after starting. Similarly, red has multiple meanings. Of course there is no actual relation between these particular groups, though it may seem like there is at first glance.
To avoid this confusion, I propose using entirely different color schemes for the histogram and the treemap (and not repeating any colors within the treemap itself), or, better yet, not putting these two charts next to each other at all, as they tell completely different stories.
Color Pitfall #2. Using the same color saturation for different magnitudes of the same variable
Similarly, I’ve made the mistake of using the same color saturation to effectively create two conflicting color legends for the exact same dashboard. Consider this trivial (population?) map that I created with data about mileage of California roads by county to illustrate the point:
Notice that there are two different sequential color legends on the dashboard that use the exact same turquoise color (R:0,G:102,B:99). In the choropleth, the fully saturated turquoise color maps to a specific county (Los Angeles County) with 21,747 total miles of roads. In the bar chart, the full turquoise color saturation maps to a specific road type (Local roads) with a total of 108,283 miles for the entire state. Just eyeing the dashboard in passing, the viewer may connect Los Angeles County with Local roads, and think they are connected. Or, the reader may look at the wrong color legend (if both are in fact included) and be misled about how many miles of road the county or type actually include.
From a software UI perspective, this error was easy to make because all I had to do was drag the “Miles” data field to the Tableau Color shelf in the map Sheet, and also drag it to the Color shelf in the bar chart Sheet. These two Sheets have totally different aggregation types, but I can easily force the same color encoding if I’m not paying attention.
How to Avoid this 2nd Type of Color Pitfall
Notice that the color encoding on the bar chart is actually redundant. We already know the relative proportions of the miles of different road types by the lengths of their corresponding bars, which is quite effective all by itself. Why also include Miles on the color shelf, especially considering the fact that the color would conflict with the choropleth map, where color is totally necessary?
My colleague Dash Davidson came up with a good solution: remove color from the bars altogether and just leave an outline around them:
Color Pitfall #3. Using too many color encodings on one dashboard
It’s very common to use too many color schemes on a dashboard, especially with big corporate dashboards where the various stakeholders call for everything but the kitchen sink to be added to the view.
Here’s a dashboard I create to illustrate the point – my first dashboard that uses the Sales SuperStore sample dashboard that comes with Tableau Desktop:
In this dashboard we see not just one red-green color encodings but two, and they have different extremes for the exact same measure (Profit). We also see red and green used in the scatterplot encoding, but now they refer to different regions instead of different profit levels. Finally, we have another bar chart that uses no color scheme, but each bar is blue – the same color as the Central region in the scatterplot.
You get the point. This isn’t what we want to create. I think I broke all of the rules in creating this one.
My Design Aspiration: One (and only one) color encoding per dashboard
This goal isn’t always possible, but as much as possible, I try to include one and only one color scheme on every dashboard I create. The reason is that I find it takes me a lot longer to figure out what’s going on in someone else’s dashboard when they’ve used more than one. It’s that simple.
This means I often have to make a tough choice – which is the variable (quantitative or categorical) that will be blessed with the one and only one color encoding on the dashboard? It’ll become the variable that receives the most attention, so it should be the one that is most related to the primary task the user will perform when using the dashboard.
For example, if the dashboard was created for a sales meeting in which the directors of each US sales region talk about what’s working well and what’s not working well in their respective regions, then the “Region” attribute could very well take the honored place of prominence:
I hope this was helpful! Do you have any pet peeves or recommendations when it comes to using colors on data dashboards?
Thanks for reading,
I have a few thoughts I’d like to put out there, in the spirit of contributing to the ongoing dialogue within the field of data visualization. I’m still relatively new to this field, having participated as a practitioner, consultant and teacher for about ten years. That decade has left me with more questions than answers, and I find more opportunities than ever to expand my knowledge and skill set, and more people than ever with unique perspectives to learn from.
There are a few things, though, that I feel particularly passionate about. The three concepts I describe below amount to “tenets” that I’d like to humbly propose others in the field consider adopting. I definitely didn’t receive them in the form of a carved stone tablet from on high. It’s just stuff I think is true and important.
1. There are no black and white rules
I don’t believe that we can ever declare that a particular visualization type or design decision either “works” or “doesn’t work“. This binary approach is very tempting, I’ll admit. We get to feel confident that we’re avoiding some huge mistake, and we get to feel better about ourselves when we see someone else breaking that particular rule. I started off in this field with that mindset.
The more I’ve seen and experienced, though, the more I prefer a sliding gray scale of effectiveness over the black-and-white “works” / “doesn’t work” paradigm. It’s true that some choices work better than others, but it’s highly dependent on the objective, audience and context. This paradigm makes it harder to decide what to do and what not to do, but I believe this approach embraces the complexity inherent in the task of communicating with other hearts and minds.
Sometimes the most effective choice in a particular situation might surprise us. Consider two analogous examples: chess and writing. Data visualization is like chess in that both involve a huge number of alternative “moves”. Garry Kasparov decided to sacrifice his queen early in a game against Vladimir Kramnik in 1994. He went on to win that game in decisive fashion. Data visualization is like writing in that both involve communicating complex thoughts and emotions to an audience. Cormac McCarthy decided to eschew virtually all punctuation in his 2009 novel The Road. He won the Pulitzer Prize for that novel. Would I recommend either of those decisions to a novice? No, but I wouldn’t eliminate them from the set of all possible solutions, either. This diagram in Tamara Munzner’s Visualization Analysis & Design illustrates why:
If we start with a larger consideration space, it’s more likely to contain a good solution. On the other hand, labeling certain visualization types as “bad” and eliminating them from the set of possible solutions paints us into a corner. Why do that?
For example, consider the word cloud. Most would argue that it’s not terribly useful. Some have even argued that it’s downright harmful. There’s a good reason for that, and in certain situations it is harmful. It’s difficult to make precise comparisons using this chart type, without a doubt. And using a word cloud to analyze or describe blocks of text, such as a political debate, is often misleading as the words are considered entirely out of context. Fair enough. Let’s banish all word clouds, then, right? Let’s malign any software product that makes it possible to create one, right?
I wouldn’t go so far. Word clouds have a valid use, as well, even if it is rare. What if we had a few brief moments during a presentation to impress upon a large room of people, including some sitting way in the back, that there are only a handful of most commonly used passwords, and they are pretty ridiculous. Wouldn’t a word cloud suffice? Would you choose a bar chart, a treemap or a packed bubble over a word cloud, in this scenario? You decide:
I admit it – I’d likely choose the word cloud in the scenario I described. The passwords jump right off the screen at my audience, even for the folks in the back. It doesn’t matter to me whether they can tell that ‘password’ is used 1.23 times more frequently than ‘123456’. That level of precision isn’t required for the task I need them to carry out. The other chart types all suffer from the fact that only a fraction of the words fit in the view. The audience can’t scan the full list at a glance to get a general sense of what’s contained in it – names, numbers, sports, batman.
If what you take from this example is that Ben thinks word clouds are awesome, you’ve completely missed my meaning. In most instances, word clouds aren’t very good at all, just like sacrificing one’s queen or omitting quotation marks entirely from a novel. But every now and then, they fit the need pretty well. We could probably come up with other scenarios where we would choose one of the other three chart types instead. Choosing a particular chart type depends on many factors. That’s a good thing, and frankly, I love that about data visualization.
2. Critique needs to be given with humility
Since there are so many variables in play, and since we hardly ever know the objective, the audience, or the full context of a particular project, we need to be humble when providing a critique of someone else’s data visualization. All we see is a single snapshot of the visual. Was this created as part of a larger presentation or write-up? Did it also include a verbal component when delivered? What knowledge, skills and attitudes did the intended audience members possess? What are the tasks that needed to be carried out associated with the visualization? What level of precision was necessary to carry out those tasks?
These questions, and many more, really matter. If you’re the kind of person who scoffs at the very mention of word clouds, your critique of my example above would be swift and harsh. And it would largely be misguided.
Open dialogue is necessary in data visualization, and healthy debate should be encouraged. When engaging in dialogue and debate, though, I try to remind myself that I don’t know all the details. Seeking to understand some of these details is an important first step. Then providing a few “pluses” and a few “deltas” almost always works. What works well (pluses), and what ideas do I have to make the visualization work better (deltas)? It’s really not that hard.
3. Freedom to innovate is necessary for growth
Lastly, I enjoy that there are so many creative and talented people in this space who are trying new things. I believe that freedom to innovate is necessary for any field to thrive, and I try to do what I can to make sure such a freedom persists in data visualization.
Making blanket statements about certain visualization types, design choices, tools, or even individuals or groups in this space isn’t helpful, and tends to reduce the overall spirit of freedom to innovate. I don’t have data to back that up, by the way, it’s just the way I feel. You may agree or disagree.
And “innovation” doesn’t just involve creating new chart types. It can also include using existing chart types in new and creative ways. Or applying current techniques to new and interesting data sets. Or combining data visualization with other forms of expression, visual or otherwise. As long as we can have a respectful and considerate dialogue about what works well and what could be done to improve on the innovation, I say bring it on.
Adding the winning ideas to the known solution space is good for us all.
I hope these tenets make sense to you. Let me know if you agree, if you’d change anything about my list, or if you’d add any tenets of your own.