It seems there’s a day for everything, right. National Cashew Day (you guessed it, that’s today), World Philosophy Day (nope, sorry – last Thursday). Heck, you can even register your own National Day of [fill in the blank].
So, to earn my Math Nerd Card for 2015, I created the following dashboard that visualizes the first 100 numbers in the Fibonacci Sequence, starting with 1 instead of 0, as I’m led to understand is the more modern convention:
Math + Data Nerds Unite: know of any other good math vizzes out there? Leave a comment!
Thanks for humoring me,
Have you noticed it as well? The tide is turning against dogmatism in data visualization, as witnessed by the increasing number of voices speaking out against a rigid approach and closed-mindedness regarding practices that are often ridiculed in knee-jerk fashion. It’s about time.
To which voices am I referring, and which data visualization practices are they defending?
1. Charts with non-zero y-axes
Can you tell Vox’s Johnny Harris and Matthew Yglesias have had it with readers taking pot shots about their choice of y-axis starting points? Their well-reasoned video entitled “Shut up about the y-axis. It shouldn’t always start at zero” says it all:
Harris and Yglesias show that the choice of axis starting point depends on the context, the unit of measure, and on the comparison being made.
2. Artistic approaches to data visualization
In Andy Cotgreave’s recent ComputerWorld article “Why Do We Visualise Data?“, Andy argues that not all data visualizations need to be burdened with the requirement of imparting ultimate precision of comparison. It depends on the purpose.
“The purpose of a visualization will also determine the extent to which you should inform effectively…Sometimes it’s more important to make someone engage with the overall message rather than the minutiae.”
Andy uses the example of Stephanie Prosavec’s “Air Transformed“: a wearable data visualization necklace showing air quality in Sheffield, UK:
Should this project really be ridiculed as “ineffective” and horribly contrary to “best practices”, with no place for it at all in the field of data visualization? Or could it be that this form of expression engages human beings in a way that a rigorous report complete with bar charts and compliant zero y-axis timelines of air quality in Sheffield could hardly do? Maybe both the report and the necklace are valid, each in their own way.
Which should you create? It depends on your purpose and your audience.
3. Pie charts
Oh, the poor, maligned pie chart. The chart type that gets pushed around and bullied on the data viz playground more than any other. Randal Olsen of /r/dataisbeautiful ran a twitter poll asking “Do you think pie charts should be banned from #dataviz?”. Scientific or not, nearly 2 in 5 responded affirmatively:
Another Twitter poll: Do you think pie charts should be banned from #dataviz?
— Randy Olson (@randal_olson) October 29, 2015
That’s amazing if you stop and think about it. Almost 40% of respondents, likely mostly data viz enthusiasts who follow Olsen, think that pie charts should never, ever, ever be used. Hilariously, Andy Kirk of Visualising Data asked whether we should also run a poll about whether those people should be banned, and Irvin Almonte’s response was sheer genius:
— Randy Olson (@randal_olson) November 5, 2015
Again: It depends. Say it with me: IT DEPENDS.
Pie charts, in certain instances, can actually be more effective than bar charts at showing specific part-to-whole comparisons. And if the part-to-whole relationship is far more important to your message than comparing uber-accurately between categories, and if there are a very small number of slices, go ahead, give thought to using a pie chart. Don’t be intimidated by pie chart haters. There, I said it.
4. Word Clouds
I entered the fray last week with my blog post “My 3 Basic Tenets of Data Visualization” in which I argued that rules of thumb, not black-and-white rules, should prevail, along with a spirit of humility and openness to exploration and innovation in data viz.
I also did the unthinkable: I defended the Word Cloud. The poor, lowly pal of the pie chart, united on the playground in mutual fear of the roving data-dogma bully. My point is that if you only had a very short amount of time to impress upon a large room of people the most commonly used passwords, which of these four visualization types would you choose?
The word cloud sacrifices precision for completeness (all of the passwords actually appear on the screen in only the word cloud) and readability (the most commonly used passwords almost shout out at the reader). Is that a reasonable trade-off to make? Maybe. It depends.
Since that blog post, “What are your most used words on Facebook” has gone viral, and we’ve been inundated with over 16 million word clouds as of the writing of this blog post. Of course one can only hope this app is not a gigantic phishing scam, but do you think a bar chart version of your most commonly used words, or a concise and thorough text analytics report would have also gone viral? Maybe, but probably not. And I’m not saying “going viral” is a end that justifies all means, but in this case, ultimate precision of comparison is probably not needed anyway. It actually worked a little too well.
Let’s not let the pendulum swing too far, though.
This casting-off of suffocating restraint and a fearful spirit of ridicule is a REALLY GOOD THING in data visualization, but let’s not let the pendulum swing too far the other way. It’s true that pie charts are very often the wrong choice, and the majority of the time a y-axis that starts at zero is a really good choice.
I’m not a big fan of the word “best” in “best practices”, which seems to promise some optimal solution, but I do like the sentiment in this response by Vance Fitzgerald to my question on twitter about rules of thumb in data visualization:
— Vance Fitzgerald (@vancefitzgerald) November 11, 2015
I’m hopeful that the next phase of data visualization is one that embraces the gray of “it depends” and encourages open dialogue and constructive criticism. In order to get there, we’ll definitely have to shed dogma. Let’s absolutely do so, but let’s also carry forward the principles and rules of thumb that just make good sense, while being open to the possibility that breaking those rules might be a great idea in specific situations.
Wouldn’t this be a more mature approach? Wouldn’t it also be more welcoming, and more enjoyable?
Thanks for reading,
I believe in the creative power of play in any discipline. Data visualization is no different. I’ve acquired new skills and grown in ways I would have never predicted, all because I spent a little bit of time playing with a data set I found intriguing. Here are a three recent side projects of mine, and a useful tip that comes from each.
1. Play specific YouTube video segments when users click on corresponding marks
I’m pretty sure most of you were aware that it was (the real) “Back to the Future Day” this past October 21st. To celebrate the occasion, as any data-loving 80s kid would want to do, I created a dashboard using what I call dimension line charts to show all 13 trips the DeLorean time machine took over the course of the trilogy.
The dashboard gives the reader the ablity to watch the movie clip associated with each DeLorean trip by clicking on the corresponding arrows. To add this feature, I first had to find a video on YouTube that includes each time travel clip back-to-back. Then, I had to grab the embed URL for the video:
Finally, I had to add the embed URL with the following parameters to each row in the spreadsheet I created:
Where XXX and YYY are the start and stop times in seconds, respectively. The “&autoplay=1″ parameter means the user doesn’t have to click the arrow and then click play. Clicking the arrow automatically starts the video clip.
Great Scott, that’s a cool tip!
2. Use the TODAY() function to have a countdown clock in your dashboard update every day
The genesis for this next project came from eyeo. I had the chance to present at the eyeo conference last year, and the day of my presentation happened to be both a Friday the 13th and a full moon. Spooky, right? I wondered how often these two events coincided, so I found a calendar listing of each one, and figured out all the instances in which they occurred or will occur on the exact same day:
The helpful tip that comes from this side project is the use of the function TODAY(). To create the “Days Till Next” table on the far right hand side of the dashboard, I first created the following three calculated fields, the first to pull in the date value for “today”, the second to compute the number of days between today and each given event, and the third to null out events that have already happened in the past:
What’s great about this technique is that every day you load the dashboard, Tableau Public server will update all of these calculated fields, including the day that TODAY() maps to, and give you a brand new countdown. Who says Tableau Public dashboards don’t update automatically?
3. Invert the y-axis to stack marks downward — if you dare!
The third side-project was undertaken on the 166th anniversary of Edgar Allen Poe’s mysterious death. It was born out of a simple question: How many works of literature did the revered writer and poet produce over the course of his life? The following dashboard stacks 150 red boxes, one beneath the other, over the course of two and a half decades to visualize his prolific career:
WARNING: This is a dangerous technique! Just ask Christine Chan, who’s “Gun Deaths in Florida” viz, which used a similar technique, drew the widespread and harsh ire of the internet. People called her graphic “misleading” and “deceptive” for making something that was getting worse look to many like it was getting better.
Fair enough. I feel that in this case, the stacked boxes in the Poe viz isn’t as likely to be misinterpreted as a line chart which slopes “downward” from one value to a greater value. We’re just stacking boxes here, and it’s pretty clear that 1845 has “more” boxes than 1844, not less. By why do it at all? Why risk misleading with the inverted y-axis? It’s for dramatic effect – it makes it appear like blood is falling down onto Poe’s head. Dramatic effect is a double-edged sword, though, so tread with caution.
And to open the Pandora’s Box on this technique, simply right click on the axis you want to reverse (it can be the x-axis or the y-axis), and check the “Reversed” box:
Thanks, I hope you enjoy these side projects as much as I did. I hope you find the tips helpful (and that you don’t burn yourself on them!). What side projects have you learned from lately?
I had the pleasure of reading Cole Nussbaumer Knaflic’s recently released book Storytelling with Data over the past week. I highly recommend it to anyone who uses charts and graphs to convey a data-driven message to an audience – that is to say, basically everyone.
In brief: Cole shows what clear and well-designed visualizations look like, and explains why they’re effective. She also gives sound advice on practices to avoid in most cases, such as pie charts, 3D views and dual axes. She stops a good exit short of Dogmaville, though, explaining that you should be able to give a good explanation why you’re using a challenging chart type if you decide to go that route.
Well beyond merely choosing chart types, the value of this book is that you will learn how to eliminate clutter, focus attention on what matters, and de-emphasize everything else. The pages are filled with high quality before and after images that bring the subject to full color and show you what “good” looks like.
She doesn’t deal with “How” from a tools perspective – her techniques and principles can be put into practice using pretty much any tool from Excel to Tableau to D3. And she doesn’t talk about “data dashboards” that are characterized by multiple charts and graphs placed side-by-side. The book deals entirely with individual visualizations and how they can be designed, annotated, and shown in sequence to tell a story and build to a coherent conclusion.
I especially enjoyed chapters 7 and 8, where Cole gleans lessons from theater, cinema and fiction and then shows how they can be applied to crafting a story with data, including determining flow and storyboarding. Chapter 8 concludes with a great example that I have attempted to recreate here using the Tableau Story Points feature. Note that this interactive graphic is entirely adapted from Cole’s work, not mine, and if you want to understand the principles behind this specific example, I encourage you to read the book:
Excerpted with permission of the publisher, Wiley, from Storytelling with Data: A Data Visualization Guide for Business Professionals by Cole Nussbaumer Knaflic. Copyright © 2015 by Cole Nussbaumer Knaflic. All rights reserved. This book is available at all booksellers.
The book concludes with case studies that deal with special topics like survey data, animating visualizations, slopegraphs, alternatives to pies, and strategies to “avoid the spaghetti graph.” In each of these cases, what I really appreciated about the book is that Cole doesn’t merely provide platitudes, she shows with clear visuals what works well and what doesn’t work as well. The book reminded me of Naomi Robbin’s Creating More Effective Graphs in that regard.
If you’ve also read the book, leave a comment below and tell us what you think. I’ve added this book to my Recommended Books page. Note that Cole also has a blog that you can follow if you find her instruction helpful.
I’m a huge fan of the Pew Research Center. They consistently publish interesting statistical insights about society in well-designed visual form.
Recently I saw this chart tweeted by Conrad Hackett that Pew published about a year ago showing which news outlets Americans of different political ideologies prefer as their main source of news about government and politics:
Pluses and Deltas
Pluses: It’s a clean design and easy to see the relative proportions for each particular ideology. There is no clutter in this chart. Obviously it’s an interesting data story that is very relevant to the news in the United States today.
Deltas: I believe there’s an opportunity to improve the ease with which a reader can track the popularity of the various news outlets across the different ideology group columns. Right now it’s tough to see, for example, how the popularity of CNN, or Fox News compares across the ideological groups. You have to scan the bars and read each label to find them. To make this comparison easier, we can use color to link the outlets across the columns, and we can arrange the rows so as to make the relative popularity of particular outlets immediately apparent. Here are three different chart redesigns:
First, we can simply add color to the bars, a hover action to highlight the bars, and a URL action to open the outlet source when a reader clicks on the bars:
Second, we can use the rows differently, so that instead of arranging them by rank, we give each outlet it’s own row. This has the added benefit of showing us how many outlets there are in total, and which outlets aren’t in the top 5 for each group:
Lastly, we can keep the table format and switch the primary encoding from bar length to cell saturation:
I’m curious to know what you think, so here is an informal, unscientific poll of readers of this site:
This is the 3rd in a blog post series called “Avoiding Data Pitfalls” (1st, 2nd) that I’ll conclude with this blog post, as I’m turning the rest of the content into a book that I hope to publish next year.
There’s a pitfall that’s quite easy to fall into when creating dashboards with multiple charts and graphs: using color in ways that confuse people. There are many ways to confuse with color, including the oft-maligned red-green encoding that color blind viewers can’t decipher. That’s just one of many, though, and I’d like to use this blog post to illustrate three additional versions of the “confusing color” pitfall that I see people like myself fall into quite often.
Then, I’ll wrap it up by talking about the design goal that I aspire to achieve any time I create a dashboard with multiple views.
Color Pitfall #1. Using the same color hue for two different variables
The example for the first type of this common pitfall comes from a New York City Marathon dashboard (click the “Participation Trends” option at the top) that was created using Qlik. Qlik is a competitor to the company I work for, Tableau, but let me be clear that the product Qlik makes isn’t to blame for the mistake, here. I see this error made with any and every data dashboard product, including Tableau.
Without further ado, here is the portion of the dashboard that I propose could be improved upon by using a different color scheme:
Following my tenet to provide critique with humility, here are some pluses (things I like) and deltas (things I would change) about this dashboard:
Pluses: I like the use of color in the histogram that shows the clear cut-off points, where finishers cross the line in droves immediately before the turning of the hour, especially the 4th hour of the race. This shows how goal-setting can affect the performance of a population, and it’s fascinating.
Deltas: Notice that the same green hue, though, applies to the increase in number of finishers from Italy, Netherlands, people who finished between 4 and 5 hours, and Switzerland. Likewise, the same yellow is used to encode the increase in the number of finishers from Germany, Mexico, and people who finished between 5 and 6 hours after starting. Similarly, red has multiple meanings. Of course there is no actual relation between these particular groups, though it may seem like there is at first glance.
To avoid this confusion, I propose using entirely different color schemes for the histogram and the treemap (and not repeating any colors within the treemap itself), or, better yet, not putting these two charts next to each other at all, as they tell completely different stories.
Color Pitfall #2. Using the same color saturation for different magnitudes of the same variable
Similarly, I’ve made the mistake of using the same color saturation to effectively create two conflicting color legends for the exact same dashboard. Consider this trivial (population?) map that I created with data about mileage of California roads by county to illustrate the point:
Notice that there are two different sequential color legends on the dashboard that use the exact same turquoise color (R:0,G:102,B:99). In the choropleth, the fully saturated turquoise color maps to a specific county (Los Angeles County) with 21,747 total miles of roads. In the bar chart, the full turquoise color saturation maps to a specific road type (Local roads) with a total of 108,283 miles for the entire state. Just eyeing the dashboard in passing, the viewer may connect Los Angeles County with Local roads, and think they are connected. Or, the reader may look at the wrong color legend (if both are in fact included) and be misled about how many miles of road the county or type actually include.
From a software UI perspective, this error was easy to make because all I had to do was drag the “Miles” data field to the Tableau Color shelf in the map Sheet, and also drag it to the Color shelf in the bar chart Sheet. These two Sheets have totally different aggregation types, but I can easily force the same color encoding if I’m not paying attention.
How to Avoid this 2nd Type of Color Pitfall
Notice that the color encoding on the bar chart is actually redundant. We already know the relative proportions of the miles of different road types by the lengths of their corresponding bars, which is quite effective all by itself. Why also include Miles on the color shelf, especially considering the fact that the color would conflict with the choropleth map, where color is totally necessary?
My colleague Dash Davidson came up with a good solution: remove color from the bars altogether and just leave an outline around them:
Color Pitfall #3. Using too many color encodings on one dashboard
It’s very common to use too many color schemes on a dashboard, especially with big corporate dashboards where the various stakeholders call for everything but the kitchen sink to be added to the view.
Here’s a dashboard I create to illustrate the point – my first dashboard that uses the Sales SuperStore sample dashboard that comes with Tableau Desktop:
In this dashboard we see not just one red-green color encodings but two, and they have different extremes for the exact same measure (Profit). We also see red and green used in the scatterplot encoding, but now they refer to different regions instead of different profit levels. Finally, we have another bar chart that uses no color scheme, but each bar is blue – the same color as the Central region in the scatterplot.
You get the point. This isn’t what we want to create. I think I broke all of the rules in creating this one.
My Design Aspiration: One (and only one) color encoding per dashboard
This goal isn’t always possible, but as much as possible, I try to include one and only one color scheme on every dashboard I create. The reason is that I find it takes me a lot longer to figure out what’s going on in someone else’s dashboard when they’ve used more than one. It’s that simple.
This means I often have to make a tough choice – which is the variable (quantitative or categorical) that will be blessed with the one and only one color encoding on the dashboard? It’ll become the variable that receives the most attention, so it should be the one that is most related to the primary task the user will perform when using the dashboard.
For example, if the dashboard was created for a sales meeting in which the directors of each US sales region talk about what’s working well and what’s not working well in their respective regions, then the “Region” attribute could very well take the honored place of prominence:
I hope this was helpful! Do you have any pet peeves or recommendations when it comes to using colors on data dashboards?
Thanks for reading,
I have a few thoughts I’d like to put out there, in the spirit of contributing to the ongoing dialogue within the field of data visualization. I’m still relatively new to this field, having participated as a practitioner, consultant and teacher for about ten years. That decade has left me with more questions than answers, and I find more opportunities than ever to expand my knowledge and skill set, and more people than ever with unique perspectives to learn from.
There are a few things, though, that I feel particularly passionate about. The three concepts I describe below amount to “tenets” that I’d like to humbly propose others in the field consider adopting. I definitely didn’t receive them in the form of a carved stone tablet from on high. It’s just stuff I think is true and important.
1. There are no black and white rules
I don’t believe that we can ever declare that a particular visualization type or design decision either “works” or “doesn’t work“. This binary approach is very tempting, I’ll admit. We get to feel confident that we’re avoiding some huge mistake, and we get to feel better about ourselves when we see someone else breaking that particular rule. I started off in this field with that mindset.
The more I’ve seen and experienced, though, the more I prefer a sliding gray scale of effectiveness over the black-and-white “works” / “doesn’t work” paradigm. It’s true that some choices work better than others, but it’s highly dependent on the objective, audience and context. This paradigm makes it harder to decide what to do and what not to do, but I believe this approach embraces the complexity inherent in the task of communicating with other hearts and minds.
Sometimes the most effective choice in a particular situation might surprise us. Consider two analogous examples: chess and writing. Data visualization is like chess in that both involve a huge number of alternative “moves”. Garry Kasparov decided to sacrifice his queen early in a game against Vladimir Kramnik in 1994. He went on to win that game in decisive fashion. Data visualization is like writing in that both involve communicating complex thoughts and emotions to an audience. Cormac McCarthy decided to eschew virtually all punctuation in his 2009 novel The Road. He won the Pulitzer Prize for that novel. Would I recommend either of those decisions to a novice? No, but I wouldn’t eliminate them from the set of all possible solutions, either. This diagram in Tamara Munzner’s Visualization Analysis & Design illustrates why:
If we start with a larger consideration space, it’s more likely to contain a good solution. On the other hand, labeling certain visualization types as “bad” and eliminating them from the set of possible solutions paints us into a corner. Why do that?
For example, consider the word cloud. Most would argue that it’s not terribly useful. Some have even argued that it’s downright harmful. There’s a good reason for that, and in certain situations it is harmful. It’s difficult to make precise comparisons using this chart type, without a doubt. And using a word cloud to analyze or describe blocks of text, such as a political debate, is often misleading as the words are considered entirely out of context. Fair enough. Let’s banish all word clouds, then, right? Let’s malign any software product that makes it possible to create one, right?
I wouldn’t go so far. Word clouds have a valid use, as well, even if it is rare. What if we had a few brief moments during a presentation to impress upon a large room of people, including some sitting way in the back, that there are only a handful of most commonly used passwords, and they are pretty ridiculous. Wouldn’t a word cloud suffice? Would you choose a bar chart, a treemap or a packed bubble over a word cloud, in this scenario? You decide:
I admit it – I’d likely choose the word cloud in the scenario I described. The passwords jump right off the screen at my audience, even for the folks in the back. It doesn’t matter to me whether they can tell that ‘password’ is used 1.23 times more frequently than ‘123456’. That level of precision isn’t required for the task I need them to carry out. The other chart types all suffer from the fact that only a fraction of the words fit in the view. The audience can’t scan the full list at a glance to get a general sense of what’s contained in it – names, numbers, sports, batman.
If what you take from this example is that Ben thinks word clouds are awesome, you’ve completely missed my meaning. In most instances, word clouds aren’t very good at all, just like sacrificing one’s queen or omitting quotation marks entirely from a novel. But every now and then, they fit the need pretty well. We could probably come up with other scenarios where we would choose one of the other three chart types instead. Choosing a particular chart type depends on many factors. That’s a good thing, and frankly, I love that about data visualization.
2. Critique needs to be given with humility
Since there are so many variables in play, and since we hardly ever know the objective, the audience, or the full context of a particular project, we need to be humble when providing a critique of someone else’s data visualization. All we see is a single snapshot of the visual. Was this created as part of a larger presentation or write-up? Did it also include a verbal component when delivered? What knowledge, skills and attitudes did the intended audience members possess? What are the tasks that needed to be carried out associated with the visualization? What level of precision was necessary to carry out those tasks?
These questions, and many more, really matter. If you’re the kind of person who scoffs at the very mention of word clouds, your critique of my example above would be swift and harsh. And it would largely be misguided.
Open dialogue is necessary in data visualization, and healthy debate should be encouraged. When engaging in dialogue and debate, though, I try to remind myself that I don’t know all the details. Seeking to understand some of these details is an important first step. Then providing a few “pluses” and a few “deltas” almost always works. What works well (pluses), and what ideas do I have to make the visualization work better (deltas)? It’s really not that hard.
3. Freedom to innovate is necessary for growth
Lastly, I enjoy that there are so many creative and talented people in this space who are trying new things. I believe that freedom to innovate is necessary for any field to thrive, and I try to do what I can to make sure such a freedom persists in data visualization.
Making blanket statements about certain visualization types, design choices, tools, or even individuals or groups in this space isn’t helpful, and tends to reduce the overall spirit of freedom to innovate. I don’t have data to back that up, by the way, it’s just the way I feel. You may agree or disagree.
And “innovation” doesn’t just involve creating new chart types. It can also include using existing chart types in new and creative ways. Or applying current techniques to new and interesting data sets. Or combining data visualization with other forms of expression, visual or otherwise. As long as we can have a respectful and considerate dialogue about what works well and what could be done to improve on the innovation, I say bring it on.
Adding the winning ideas to the known solution space is good for us all.
I hope these tenets make sense to you. Let me know if you agree, if you’d change anything about my list, or if you’d add any tenets of your own.
First, apologies for the blog post drought! It was that kind of summer. It’s good to be back, though, and I hope you’ve been well.
Scatterplots are my favorite visualization type, hands down. From my very first interactive data graphic about The Great One to the most recent visualization below on major league pitchers, I’ve learned a great deal from these Cartesian classics over the years. In this post I’ll show you how to make them even better than the standard ones in Tableau.
Recently, Shine Pulikathara published a scatterplot of NFL player heights and weights that included two marginal histograms – one for each axis. I tweeted that I liked it, and Lynn Cherny replied that it’s pretty common to see this kind of thing in R:
@DataRemixed those are pretty common in R plots
— Lynn Cherny (@arnicas) September 14, 2015
She’s right, and it turns out that it’s also a common convention with other statistical graphing platforms, like Matlab and Plotly. It’s called a Scatterplot with Marginal Histograms. While Tableau has scatterplots and histograms as standard chart types, it doesn’t automatically combine them for you into a single view. The goods news, though, is that it’s fairly easy to combine them using a dashboard with three sheets. There’s only one small trick to make the charts interact the way you want, which I’ll cover below. If you want to follow along, download 2015pitchingstats.xlsx.
First, here is the finished version, showing pitchers “skill” (Earned Run Average, or ERA) and “luck” (Runs Scored by their team, or RS) so far in the 2015 season:
Now, let’s consider the four easy steps to create a scatterplot with marginal histograms:
Step 1: Create the Three Sheets
This part is fairly straightforward – create a scatterplot and two histograms as three separate sheets in the same workbook. To create the scatterplot, drag ERA to Columns, RS to Rows, W% to Color, Player to Label, and then add two Average reference lines, like this:
Next, to create the first histogram, create a new sheet, click on the Measure (say, ERA), click Show Me in the top right, and then choose Histogram. Do the same in another new sheet with RS, but click the Rotate icon in the top icon bar to flip the RS histogram 90°. Notice that two new data fields appear in the Measures area: “ERA (bin)” and “RS (bin)”. Right click to edit these fields and change the “Size of bins” to be 0.25 and hide the axes.
Step 2: Add the Histogram Bin Dimensions to the Scatterplot Chart Detail
Without this step, you won’t be able to get the sheets to interact together in the dashboard. Go back to the scatterplot sheet you created in Step 1 and drag both “ERA (bin)” and “RS (bin)” to Detail. You should now see these two fields listed in the Marks card area:
Step 3: Add the Three Sheets to a Dashboard
Next, create a new dashboard and add the three sheets you created in Step 1. Aligning the histograms with the scatterplot is the one messy part of this method. Add blanks to the left and right of the ERA histogram, and above and below the RS histogram. Drag the blanks until the extreme bars of the histogram align with the extreme points of the scatterplot:
Step 4: Create Two Highlight Actions:
The last step is to get the sheets to interact with each other. There are lots of ways they could potentially interact, but here’s what I’d like to see happen:
- When I hover my mouse cursor over any of the histogram bars, the corresponding circles on the scatterplot highlight
- When I hover my mouse cursor over any of the scatterplot circles, the corresponding histogram bars highlight
To do this, create two new dashboard actions by clicking Dashboard > Actions > Add Action > Highlight, and fill out the dialog boxes as follows:
That’s it! For finishing touches, I added a title, lead-in paragraph, data source and last accessed note, four area annotations to define the four quadrants, and two mark annotations to call out points of interest. I also edited the two Average reference lines to uncheck “Show recalculated line for highlighted or selected data points”. This was strictly a matter of preference, and you may not decide to modify the reference lines in that way.
Here are a couple other variations that don’t involve the binning concept inherent in histograms, and therefore don’t required Step 2 above:
Scatterplot with Marginal Box-and-Whisker-Plots
Scatterplot with Marginal Hash Lines
Thanks for reading! I hope you found this helpful. Let me know if you have any further tips by leaving a comment. Also, I’m curious, which of the three variations – marginal historgrams, box plots, or hash lines – do you prefer?
I spend a lot of my free time with my family in nature, and it occurred to me recently that there is something particularly captivating about data visualizations that resemble nature. It makes sense if you think about it. As a species, we’re awed and inspired by majestic landscapes, and we’re drawn in by the intricate patterns we see in the world around us.
It’s the same with data visualizations. We use words like “enlightening,” “stunning,” and “beautiful” to describe the really good ones. Some people tire of the use of these adjectives, but I think there’s a good reason they have found their application here. They describe our inner experience.
To illustrate the point, I chose five visualizations published to the new Tableau Public “Greatest Hits” gallery and found five corresponding images from nature.
Here is the same content posted to Slideshare:
I’m not suggesting we all go out and start making visualizations that look like sand dunes or tadpoles, as appealing they might look. And relatively few visualizations will work in “petri dish” form, though one about rapidly growing companies just might be in that small set. I’m just sharing my observation that I tend to stop and look closely at patterns that are familiar.
And we can take an important cue from nature as well. Many of nature’s patterns are both mesmerizing and incredibly informative, like the tree rings that encode important data about the life of a tree. If you can inspire as well as inform, then why wouldn’t you do both? Nature does.
Thanks for stopping by,
William Zinsser, 92, died last week. Zinsser, author of On Writing Well, the classic guide to writing nonfiction, has been an inspiration to writers and aspiring writers since he first published his manual in 1976. Douglas Martin of the New York Times has written an excellent obituary on Zinsser.
1.5 million copies of On Writing Well have been sold to people all around the world who care about “getting the words right”, as Ernest Hemingway put it. I first read On Writing Well before launching this website four years ago, and I referred to it countless times while writing my first book, Communicating Data With Tableau (O’Reilly, 2014). Zinsser’s book taught me to respect sentences and words more than I had before. Or, to put it another way, it made me realize that my writing sucked. It was a harsh realization at the time, but I needed to know that upfront. I’d like to think that my writing sucks a little less thanks to Zinsser.
As I read On Writing Well it struck me that his advice for communicating well with words applies directly to the craft of communicating visually with data. His seven principles in Part I – The Transaction, Simplicity, Clutter, Style, The Audience, Words, and Usage – could be written about visualizing data as well.
Let’s call it On Visualizing Data Well:
1. The Transaction
Zinsser opened his classic book by teaching that “the product that any writer has to sell is not the subject being written about, but who he or she is.” (emphasis mine). The transaction, then, is a personal one – the reader is drawn in by the “enthusiasm of the writer for his field,” and the two most important qualities that result are “humanity and warmth.”
The same is true when someone examines the product created by a data visualizer. The humanity of the visualizer should shine though:
- What does the person creating the visualization think about the topic?
- Why does he or she care about this topic?
- How does he or she feel about it?
- Do you know something more about them, not just their topic?
Take another look at one of the most talked about visualizations of 2014, Periscopic’s animated U.S. Gun Deaths. Regardless of what you think about the design, form or aesthetic of the final product, you can’t help but feel the emotions of those who created it. Is there any doubt about what they’re trying to say, and more importantly, why they’re saying it? Kim Rees and her team brought their own humanity to the transaction:
Zinsser taught that “the secret of good writing is to strip every sentence to its cleanest components,” and that there are a “thousand and one adulterants that weaken the strength of a sentence.” Here’s an example he gives of “the clotted language of everyday American commerce:”
“The airline pilot who announces that he is presently anticipating experiencing considerable precipitation wouldn’t think of saying it may rain.”
It’s easy to laugh at this bloated phrase because we see it all the time, and we even fall prey to it ourselves. The same is true when we communicate with data. The lesson is that we shouldn’t overcomplicate the message.
This lesson is often misunderstood to mean that we should dumb down the message, or only choose simplistic messages in the first place. This interpretation is wrong. Just as a writer sometimes seeks to articulate a profound thought, we sometimes seek to show relationships that are complex. That’s okay, and we shouldn’t shrink from that challenge in the name of simplicity. But if there’s a clear way to show it, then we should show it clearly. In the words of Albert Einstein, “everything should be made as simple as possible, but not simpler.”
Zinsser wrote “writing improves in direct ratio to the number of things we keep out of it that shouldn’t be there.” He opens his third chapter with a funny example from the annals of U.S. history:
“Consider what President Nixon’s aide John Dean accomplished in just one day of testimony on television during the Watergate hearings. The next day everyone in America was saying ‘at this point in time’ instead of ‘now’.”
His admonition is to “examine every word you put on paper.” When working with his students at Yale, Zinsser would “put brackets around every component in a piece of writing that wasn’t doing useful work.” Sound familiar? Edward Tufte’s notion of chartjunk is the same notion. Designers and artists celebrate the white space in their creations. Obviously we shouldn’t remove every pixel, just the ones that aren’t doing any work. The trick is knowing which is which.
Next Zinsser addresses the objection that reducing a writing product to its simplest form leaves no room for style. He concedes that “simplicity carried to an extreme might seem to point to a style little more sophisticated than ‘Dick likes Jane’ and ‘See Spot run’.”
In data viz, the corollary to these preschool sentences is the bar chart. Simple, easy to understand, but no flair. We’re in familiar territory. It’s the never-ending “clarity vs. beauty” debate. But it’s a false dichotomy. Clarity and beauty are not mutually exclusive. Of course we can achieve both. Information visualization design firm Accurat does it all the time. Here’s an example of their work:
How can we achieve both simplicity and style? Clarity and beauty? Zinsser’s advice for writers applies to us, too. There’s a reason his chapter on style follows the previous two. A singer with loads of personality who sings out of tune won’t sell records. A carpenter who adds bevels and carvings galore to a chair that doesn’t hold your weight won’t stay in business for long. “This is the problem of writers who set out deliberately to garnish their prose.” Zinsser uses the wood-working analogy to show us the way:
“Extending the metaphor of carpentry, it’s first necessary to be able to saw wood neatly and to drive nails. Later you can bevel the edges or add elegant finials, if that’s your taste. But you can never forget that you are practicing a craft that’s based on certain principles.”
To create a data visualization that is both clear and beautiful, we first must get the raw materials and basic proportions right. Only then we can add what Willard Cope Brinton calls “judicious embellishment of charts”. What’s judicious? Fortunately, as Zinsser puts it, “there is no style store”, and you’ll have to answer that for yourself. Your audience will also have an opinion on the matter.
5. The Audience
Speaking of audience, Zinsser addresses this critical element in the fifth chapter of his masterpiece. We often talk about “knowing your audience” in data viz, and user-centered design in product development. It’s a very popular topic. I even give similar platitudes in the first chapter of my own book.
But Zinsser gives what at first seems like shocking advice on this subject. He says:
“You are writing for yourself. Don’t try to visualize the great mass audience. There is no such audience – every reader is a different person.”
Only write for yourself and don’t even consider who will see your work? Really? He clarifies by differentiating between a mechanical act (“work hard to master the tools”) and a creative act (“the expressing of who you are”). If you lose someone through sloppy workmanship, then it’s your fault. If you lose someone because they don’t like what you have to say, don’t worry. “You are who you are, he is who he is, and either you’ll get along or you won’t.”
In other words, care about your audience’s ability to decipher your message, and get that part right, but don’t care about whether they’ll agree with you or like you. Just say what you need to say based on what you find in the data.
Zinsser’s sixth chapter, entitled Words, deals with avoiding “cheap words, made-up words and cliché that have become so pervasive that a writer can hardly help using them.” His advice: “You must fight these phrases or you’ll sound like every hack.”
Do we have clichés in the world of data viz? Yes, we do. We all seek to imitate others in some way. The cliché in any field is just the tacky or ineffective element that people continue to use in spite of the fact that it’s bad, just because others use it. Think Periodic Tables and Subway Maps.
At the Tapestry Conference in 2014, Martin Wattenberg and Fernanda Viegas gave a presentation on genres in data visualization. They explained how we often use a shared language that gives our readers shortcuts to understanding. While this can be good, we are in danger of getting stuck in these genres, which can become formulaic.
Here’s their full presentation:
Wattenberg and Viegas say the key is awareness. Awareness of the elements of the genre or genres we’re in, and those elements to which we really shouldn’t adhere, because they don’t work or are tacky. We have to care enough to examine each element and root it out if it’s cliché, regardless of what our peers might be doing. According to Zinsser, “the only way to avoid it is to care deeply about words.” We also have to care about the data.
Zinsser’s last principle is about determining whether what’s new should be “ushered in” as accepted practice, or whether it should be “thrown out on [its] ear.” For any field to be vibrant and thriving, and for it to be at all fun, it must be fluid and not static. Is data viz fluid, or is it static?
Just as there is “no king to establish the King’s English,” there is no anointed panel to accept or reject new methods or tools in data viz, at least not that I know of. We all get to cast a vote by what we use. One of the chief values that innovators like Accurat and Periscopic bring to the field of data viz is a fresh take on this business of communicating with data. We all get to observe one another’s work, and if we keep the principles in mind, the history books will determine what gets kept and what gets left behind.
I’m confident that we’ll get the pixels right.
Thanks for reading,