Skip to content

Sports Viz: The Top International Goal Scorers

2016 June 27
by Ben Jones

Hi all,

It has been a while since I posted here. This one’s just for fun.

There are two exciting men’s football (aka ‘soccer’) tournaments going on right now – Euro 2016 and Copa America. I was wondering which players in the history of the sport have the highest number of goals scored over the course of their international careers. The data was fairly easy to find on Wikipedia, and since I’m somewhat fond of scatterplots, I created this simple viz to help me understand who has been the most prolific at putting the ball into the back of the net:

Some notes about the project:

  • The viz features three filters – Confederation and Career Status selectors, as well as a Goals per match slider.
  • I annotated the player with the highest goal scoring rate (Poul Nielsen of Denmark) since the relative sizes of the flag shapes are much harder to compare than the x- and y-axis positions of the shapes.
  • I used a dual axis on the scatterplot to create a yellow border around the players who are still active. I also added a note in the bottom right explaining the meaning of the yellow outline.
  • In order to figure out if the player is still active, I created a calculated field called “Active?” that looks at the value in the field “Career End” and assigns a value of “Active” if it’s null, and a value of “Retired” if it’s not null.

That’s about it. I enjoyed the process, and learned something new about a sport I love.

Thanks, I’d appreciate any feedback you might have, as always.

A First Look at Simple Chart-Builder Atlas

2016 May 12
by Ben Jones

Hi all,

Just a quick note in case you didn’t see the announcement that digital news publication Quartz is making their internal chart-making tool Atlas available to the public. I registered and was given access to use this web-based simple chart maker, and here’s what I created:

It only took me about 5 minutes to get the hang of the web-based UI and create this bar chart. I tried a couple more complicated data sets first, but ran up against a 12 column limit. I didn’t see how to sort the bars, so I went back to the Excel document, sorted the data table in the spreadsheet, and then re-copied and pasted it into the window so the bars were sorted how I wanted them – by decreasing french fry score.

You can also export to SVG, the code is open source, and the chart looks great on a mobile device without any programming or configuration on my part. It just works. No filtering, interactivity or advanced analytics, very limited formatting and customization options, and no dashboards. Single charts are what this tool allows you to build, and what it does, it does well. I can see lots of people who want to make simple charts and graphs using this tool to crank out publication and mobile-ready views.

Anyone else play with it? What do you think?


How to Embed Google Trends in a Tableau Dashboard

2016 April 27
by Ben Jones

This blog post shows how to marry two free online data tools: Tableau Public and Google Trends. Why? Because you might want to quickly check how certain categories in your data have fared in search over time relative to one another. {Disclaimer: this tutorial will only be valid as long as Google keeps it’s Google Trends URL scheme the same. I learned the hard way with my Google Maps + Tableau tutorial that you just can’t bank on Google leaving things the same for very long.}

It’s in response to a question posted by Alexandra Samuel on twitter:

I had given a crack at using Google Trends data in Tableau a year and a half ago with an Ebola scare viz, but it involved a very manual process of downloading the CSV out of Google Trends. Hardly the web data connector experience Alexandra was looking for. Fellow Tableau geek Eric Peterson did some digging and found out that Google does not have an API for Google Trends:

So end of the road? Are we out of luck if we want to automatically pull search interest data into Tableau until Google makes an API for Google Trends? Not entirely. Here’s my solution (using fast food survey data in honor of Tableau Public’s Food Viz Month theme), and you’ll find a tutorial on how I built it below:

How to Add Google Trends Data to a Tableau Dashboard

The trick to do this involves two key elements:

Step 1: Build a new Sheet

This part was totally straightforward. I created a basic Excel table out of YouGov survey results where respondents said which restaurant had the best burgers and which had the best fries. Then I dragged “Best beef burger” to the Columns shelf and “Best fries” to the Rows shelf, adding the fast food logos by dragging “Fast Food Chain” to Shape and making sure I had a folder with the logo png files inside my Documents/My Tableau Repository/Shapes folder. Easy.

Step 1: Create a new sheet

Step 1: Create a new sheet

Step 2: Create a new dashboard and add a blank Web Page object

Once I had build my scatterplot, I created a new Dashboard, dragged the scatterplot sheet onto it, and the dragged a new Web Page object from the section on the left middle panel and just leaving the Edit URL field in the resulting dialog box blank by clicking OK:

Step 2: Add a blank Web Page object to the Dashboard

Step 2: Add a blank Web Page object to the Dashboard

Step 3: Create a new URL Dashboard Action

Now that the Web Page object is out on the dashboard, I can control which website gets shown inside the box by creating a URL Dashboard Action. To do so, click Dashboard > Action > Add Action > URL.

Step 4: Build the Google Trends URL in the URL Action dialog:

The trick is to copy and paste this URL into the Edit URL field:

where instead of DIMENSION_FIELD_NAME you would put the data field that contains the values you want to compare in search interest. In our case, it’s the field Fast Food Chain surrounded by angle brackets. Notice that I also changed the URL to run on Select instead of Menu, and I made sure to allow for multiple select by checking the box at the bottom:

Step 4: Build the URL Action

Step 4: Build the URL Action

That’s it! A word of caution: Google Trends seems to have a search quota, so you might hit your limit like I did when creating this dashboard and trying different things, hitting the trends site from the same IP address over and over. The good news is if you want a couple hours, you should be good to go again. At least that’s how it was for me.

Obviously this isn’t quite as awesome as accessing the actual raw search trends data itself directly within a Tableau workbook, but I think it’s a pretty good half-way solution. One limitation is that you can’t control the colors of the timelines in the Web Page object, which makes it tough to coordinate your dashboard colors. That’s one reason I used logo images instead of colors to differentiate between the restaurants in the scatterplot

I think this tip could be taken to a whole different level by embedding Google Trends maps, or searching within countries, etc. All you’d need to do is figure out how Google Trends controls these parameters by looking at the URL that it creates when you do a search.

Okay, thanks for reading. I hope this was helpful to you. Let me know if you find a way to improve on it!


The Design of Everyday Visualizations

2016 April 11

I’ve been educated and inspired recently by the best selling design classic “The Design of Everyday Things” by UX guru Don Norman. You really have to read the entire book, which applies to all types of objects that people design – from chairs to doors to software to organizational structures. It provides thoughtful and practical principles that guide designers to design all of those things well. By “well” he means “products that fit the needs and capabilities of people.” (p.218)

As I read it, it occurred to me that data visualizations are “everyday things” now, too. Even richly interactive ones viewed on tablets and phones. That has only become the case in the past half-decade or so. Yes, examples can be traced back to the early days of the internet, but the recent explosion of data, software tools and programming libraries has caused their proliferation.

And I found that point after point, principle after principle in Norman’s book applied directly to data visualization. I’d like to call out five points that struck me as particularly relevant to recent discussions in the field of data visualization.

1. Good visualizations are discoverable and understandable

Norman starts his book describing two important characteristics of all designed products:

  • Discoverability: Is it possible to even figure out what actions are possible and where and how to perform them?
  • Understanding: What does it all mean? How is the product supposed to be used? What do all the different controls and settings mean?

He talks about common things that are often anything but discoverable and understandable, such as faucets, doors and stovetops. One of my favorite quotes in the book is about faucets:

If you want the faucet to be pushed, make it look as if it should be pushed. p.150

Regarding doors, Vox published a great video on a particularly poorly designed door on the 10th floor of the Vox Media office. The video references and even includes interview footage with Don Norman himself. And it’s funny. You should watch it.

It occurred to me that the typical stovetop design snafu has a direct translation into the world of data visualization. To explain, let’s start with the problem with stovetops. Ever turn on the wrong burner? Why? Because you’re stupid? No. Because there are often poor mappings between the controls and the burners. The burners are often arranged in a two-by-two grid and the controls are often in a straight line, like this:

What does that have to do with data visualization? We often use similar controls – radio buttons, combo boxes, sliders, etc – to filter and highlight the marks in the view. When there are multiple views in a visualization (a dashboard), there is a similar opportunity to provide clear, or natural, mappings.

Norman gives the following advice for mappings:

  • Best mapping: Controls are mounted directly on the item to be controlled.
  • Second-best mapping: Controls are as close as possible to the object to be controlled.
  • Third-best mapping: Controls are arranged in the same spatial configuration as the objects to be controlled.

Often the software default places the controls on the right hand side. Here’s my attempt to show these options on a generic data dashboard, where the four different views are labeled A, B, C and D, and the controls that change them are labeled according to the views they modify:

Screen Shot 2016-04-11 at 12.16.59 AM

This is a relatively straightforward example, and the job of the designer of a more complex visualization is to make it similarly clear what can be done and how to do it. Designers use things like affordances, signifiers, constraints and mappings to make it obvious. Note that it takes a lot of effort to make the complex obvious.

2. Don’t blame people for getting confused or making errors

A fundamental principle that Norman drives home a number of times in the book is that human error usually isn’t the fault of humans, but rather of poorly designed systems. Here are two great quotes on the topic:

It is not possible to eliminate human error if it is thought of as a personal failure rather than as a sign of poor design of procedures or equipment. p167

And again on the same page:

If the system lets you make the error, it is badly designed. And if the system induces you to make the error, it is really badly designed. When I turn on the wrong stove burner, it is not due to my lack of knowledge: it is due to poor mapping between controls and burners. p.167

Norman differentiates between two types of errors: slips and mistakes.

  • Slips are when you mean to do one thing, but you do another.
  • Mistakes are when you come up with the wrong goal or plan and then carry it out.

Both types of errors happen when people interact with data visualizations. In the world of mobile, slips are so common – Maybe I meant to tap that small icon at the edge of my phone screen, but the phone and app recognized a tap of an adjacent icon instead.

Mistakes are also common. Maybe it made sense to me to filter to a subset of the data to get my answer, but in reality I was misleading myself by introducing a selection bias that wasn’t appropriate at all. If someone makes the wrong decision based on misinformation they took from your visualization, that’s your problem at least as much as it is theirs, if not more so.

How to make sure your readers avoid slips and mistakes? Build and test. Iterate. Watch people interact with your visualization. When they screw up, don’t blame them or step in and explain what they did wrong and why they should’ve known better. Write it down and go back to the drawing board. If the person who agreed to test your visualization made that error, don’t you think many more likely will? And you won’t be there to tell them all what they did wrong. Your only chance to fix the error is to prevent it.

3. Designing for pleasure and emotion is important

I’m a big believer in this principle. Norman states that “great designers make pleasurable experiences”:

Experience is critical for it determines how fondly people remember their interactions. Was the overall experience positive, or was it frustrating and confusing? p.10

How can an experience with a data visualization be pleasurable? In lots of ways. It can make it easy to understand something interesting or important about our world, it can employ good design techniques and artistic elements, it can surprise us with a clever or funny metaphor, or some combination of these and more.

What about emotion? The “e word” to which the analytical folks in our midst are allergic. Cognition gets a lot of play in the world of data visualization, but emotion does not. But these two horses of the chariot that is the human spirit are actually inextricably yoked:

Cognition and emotion cannot be separated. Cognitive thoughts lead to emotions: emotions drive cognitive thoughts. p.47

I also love the following quote:

Cognition attempts to make sense of the world: emotion assigns value…Cognition provides understanding: emotion provides value judgements. p.47

So let’s embrace emotions. Some data visualizations piss us off. Some crack us up. Some are just delightful to interact with. These elements of the experience should be part of the discourse in our field, and not ignored just because they don’t match the left-brained predisposition of the bulk of the so-called experts. If we take them into consideration, we’ll probably design better stuff.

4. Complexity is good. Confusion is bad.

There’s a trend in our field to move away from the big, complex dashboards of 2010 and toward “light-weight” and uber-simple individual graphs, and even GIFs. Why? A big part of the reason is that they work better on mobile. It’s true, and what we’ve learned in the past few years is that the complexity of those big dashboards isn’t always necessary.

This is a great development, and I’m all for it, but let’s just remember that there was often a great value to the rich interaction that is still possible on a larger screen. Instead of abandoning rich interactivity altogether, I believe we should be looking for new and innovative ways to give these advanced capabilities to readers on smaller devices. When those capabilities will help us achieve some goal, we’ll be good to go. We’re not there yet.

After all, it’s not the complexity of the detailed, filterable dashboard that’s the problem on the phone – it’s that we haven’t figured out how to give these capabilities to a reader using a phone yet, and the experience is confusing. Does this make sense to you?

I actually see this as a good thing. Our generation has the chance to figure this out for the generations to come. The growth of the numerical literacy of our population will be well worth the effort.

5. Absolute precision isn’t always necessary

I have to be honest. This one is my hot button. There’s a school of thought that says that the visualization type that gives the reader the ability to guess the true proportions of the thing visualized with the greatest accuracy is the only one that can be used. Some go so far as to declare it immoral to chose a visualization type that introduces any greater error than another (they all have some error).

I found this great visualization about visualizations in Tamara Munzner’s book Visualization Analysis and Design:


The problem with this line of reasoning is that absolute precision isn’t always necessary for the task at hand. He uses the example of converting temperature from Celcius to Farenheit. If all you need to do is figure out if you need to wear a sweater when you go outside, a shortcut approximate conversion equation is GOOD ENOUGH. It doesn’t matter whether it’s 52, 55, 55.8, or 55.806. In all four cases, you’re wearing a light sweater.

And let me repeat the point: there are errors associated with every visualization type – we aren’t machines and perfect decoders of pixels or ink. Sometimes it’s okay that a general understanding is achieved.

And for goodness sake, if absolute precision is required, then use labels, or just show a table of exact values.

Wrapping it up

I hope this was helpful for you! I love doing this kind of thing – pulling lessons from other amazing writings and seeing how they apply to data visualization, which I see as the “catch-all” discipline. It’s part numeric, part editorial, part graphic. To do it well, we need to embrace the principles of good design. I’ve tried to outline a few here from a true expert who we should all be familiar with. If you do read Norman’s book, you’ll find that there are many more.

Many thanks to my colleague Jewel Loree for pointing me to this book way back when. I finally got around to reading it, and am glad I did.



Gun Homicides vs. Gun Suicides in the US

2016 January 28
by Ben Jones

Nelson Davis, Matt Chambers and Alex Duke have formed the Reviz Project which you can read more about here. Their first challenge was to visualize the relationship between gun homicides and gun suicides in the United States. Data is sourced from the CDC.

I took a quick pass at visualizing the ratio between suicides and homicides for each state and over time using a scatterplot (I’m a scatterplot junkie) and a timeline. Hover over each state circle in the scatterplot to filter the timeline below to show the trend for a chosen state, and use the slider at the bottom of the timeline to explore the relationship between these two variables for a particular year in the scatterplot:

The dashboards they created to visualize this same data are quite elaborate (you can find them on their blog). While this month’s data story is very sobering, it’s always fascinating to me how different people, starting with the same raw data, and – in the case of Nelson, Matt, Alex and I – the exact same tool (Tableau), can come up with very different results, and very different insights.

That’s why data visualization has such a strong social component to it.

Thanks for reading,

To Optimize or to Satisfice in Data Visualization?

2016 January 12
by Ben Jones

Q: In data visualization, is there a single “best” way to visualize data in a particular scenario and for a particular audience, or are there multiple “good enough” ways?

That’s the debate that has resurfaced on Stephen Few’s and Cole Nussbaumer’s blogs today.

  • In summary, Few says “Is there a best solution in a given situation? You bet there is.”
  • In contrast, Cole says “For me, though, it is possible to have multiple varying visuals that may be equally effective”

Could Both be Right?

This is going to sound strange, but I think both are right, and there is room for both approaches in the field of data visualization. Let me explain.

Lucky for us, really smart people have been studying how to choose between a variety of alternatives for over a century now. Decision-making of this sort is the realm of Operations Research (also called “operational research”, “management science” and “decision science”). Another way of asking the lead-in question is:

Q: When choosing how to show data to a particular audience, should I keep looking until I find a single optimum solution, or should I stop as soon as I find one of many that achieves some minimum level of acceptability (also called the “acceptability threshold” or “aspiration level”)?

The former approach is called optimization, and the latter was given the name “satisficing” (a combination of the words satisfy and suffice) by Nobel laureate Herbert A. Simon in 1956.

So which approach should we take? Should we Optimize or Satisfice when visualizing data?

I believe there is room for both approaches. Which approach we take depends on three factors:

  1. Whether or not the decision problem is tractable
  2. Whether or not all of the information is available
  3. Whether or not we have time and resources to get the necessary information

But What is the “Payoff Function” for Data Visualizations?

This is a critical question, and where I think some of the debate stems. Part of the challenge in ranking alternative solutions to a data visualization problem is determining what variables go into the payoff function, and their relative weight or importance. The payoff function is how we compare alternatives. Which choice is better? Why is it better? How much better?

Few says that “we can judge the merits of a data visualization by its ability to make the information as easy to understand as possible.” By stating this, he seems to me to be proposing a particular payoff function: increased comprehensibility = increased payoff.

But is comprehensibility the only variable that matters (did our audience accurately and precisely understand the relative proportions?) or should other variables be factored in as well, such as attention (did our audience take notice?), impact (did they care?), aesthetics (did they find the visuals appealing?), memorability (did they remember the medium and/or the message some time into the future?) and behavior (did they take some desired action as a result?).

Here’s a visual that shows how I tend to think about measuring payoff, or success, of a particular solution with hypothetical scores (and yes, I’ve been accused of over-thinking things many times before):


It’s pretty easy to conceive of situations, and I’d venture to say that most of us experienced this first-hand, where a particular visualization type may have afforded increased precision of comparison, but that extra precision wasn’t necessary for the task at hand, and the visualization was inferior in some other respect that doomed our efforts to failure. Comprehensibility may be the single most important factor in data visualization, but I don’t agree that it’s the only factor we could potentially be concerned with. Not every data visualization scenario requires ultimate precision, just as engineers don’t specify the same tight tolerances for a $15 scooter as they do for a $450M space shuttle. Also, visualization types can make one type of comparison easier (say, part-to-whole) but another comparison more difficult (say, part-to-part).

Trade-Offs Abound

What seems clear, then, is that if we want to optimize for all of these variables (and likely others) for our particular scenario and audience, then we’ll need to do a lot of work, and it will take a lot of time. If the audience is narrowly defined (say, the board of directors of a specific non profit organization), then we simply can’t test all of the variables (such as behavior – what will they do?) ahead of time. We have to forge ahead with imperfect information, and use something called bounded rationality – the idea that decision-making involves inherent limitations in our knowledge, and we’ll have to pick something that is ‘good enough’.

And if we get the data at 9:30am and the meeting is at 4pm on the same day? Running a battery of tests often isn’t practical.

But what if we feel that optimization is critical in a particular case? We can start by simplifying things for ourselves, focusing on just one or two input variables, making some key assumptions about who our audience will be, what their state of mind will be when we present to them, and how their reactions will be similar to or different from the reactions of a test audience. We reduce the degrees of freedom and optimize a much simpler equation. I’m all for knowing which chart types are more comprehensible than others. In a pinch, this is really good information to have at our disposal.

There’s Room for Both Approaches

Simon noted in his Nobel laureate speech that “decision makers can satisfice either by finding optimum solutions for a simplified world, or by finding satisfactory solutions for a more realistic world. Neither approach, in general, dominates the other, and both have continued to co-exist in the world of management science.”

I believe both should co-exist in the world of data visualization, too. We’ll all be better off if people continue to test and find optimum visualizations for simplified and controlled scenarios in the lab, and we’ll be better off if people continue to forge ahead and create ‘good enough’ visualizations in the real world, taking into account a broader set of criteria and embracing the unknowns and messy uncertainties of communicating with other thinking and feeling human minds.

Thanks for reading my $0.02. I’d like to hear your thoughts.

When Memorability Matters: Another Practitioner’s View

2015 December 10
by Ben Jones

The following comments are in response to Stephen Few’s recent newsletter entitled “Information Visualization Research as Pseudo-Science” in which he critiqued an academic paper by Borkin et al entitled “Beyond Memorability: Visualization Recognition and Recall“. I’m not an academic researcher, so I will leave it to others in the field to respond to Few’s specific criticisms of the paper’s methods. My goal in this article is to respond to opinions Few voiced about memorability in data visualization.

I’d like to start by asking a few questions:

  • Does it matter whether a data visualization is memorable or not?
  • Should we, as data visualization practitioners, care about memorability?
  • Should we design our visualizations so that those who view them are more likely to remember them at a later point in time?
  • Is memorability a worthwhile area of study for those studying data visualization in academia?

In my opinion, and in my experience, the answer to each of these questions is ‘Yes’.

In Stephen Few’s recent newsletter entitled “Information Visualization Research as Pseudo-Science”, though, he put forward a differing opinion:

“Visualizations don’t need to be designed for memorability— they need to be designed for comprehension. For most visualizations, the comprehension that they provide need only last until the decision that it informs is made. Usually, that is only a matter of seconds.” – Stephen Few (emphasis his)

This statement helped me understand why Few and I disagree about memorability: we disagree about how data visualizations are used by groups of people. Simply put, I don’t believe data visualizations are “usually” followed by decisions “only a matter of seconds” later. That may be how a robot or a computer algorithm would approach decision-making, but it’s just not how groups of humans in organizations go about it.

How do groups of humans usually work with data visualizations, then? Well, analysts prepare dense packets for pre-reading materials, directors and VPs attend review meetings where they look at lots and LOTS of data and charts, sometimes they take copious notes, sometimes they zone out and check their smart phones, then they break for lunch, check their email, reconvene and consider different topics, only to have the final decision made at a totally different planning meeting or off-site weeks later.

Sound familiar? That’s a whole lot messier than question -> visualization -> decision in seconds. And that’s only one reason why memorability matters.

In my experience, the memorability of the overall message (of which the visualizations are a critical element) matters most when:

  1. Decisions won’t be made immediately
  2. The audience doesn’t care deeply about the topic
  3. The environment is already saturated in data and visualizations

To illustrate these three conditions, let me relate a personal story from my experience working with data and groups of decision makers. The specific details of the account have been altered to protect the innocent.

A Practitioner Wins Thanks to Memorability

One time I had the unenviable task of presenting the results of the launch of a product that was, shall we say, less than “top-of-mind” to the executives at a Fortune 500 company. Think “razor” of the razor – razor blade model. Sales should just be a pull-through, so they didn’t pay much attention to it at all.

But what we were finding was that the relative neglect of this high-touch product was causing a lot of dissatisfaction, and our lack of attention to the details of the product offering was causing us to lose customers.

In preparing for the presentation, I created plenty of nice, Tufte-compliant charts and graphs, like this one (a generalized mock up), to show how the recently-launched product was doing in the marketplace:


A comprehensible but not particularly memorable chart

Do you notice the problem in the chart? That’s right, we didn’t launch a green SKU in Configuration B.

Why not? Tooling investment.

Who cares? Customers did. A lot of them. The nature of the product was such that customers couldn’t select between A & B. There were factors that pre-determined that for them.

Now I was scheduled to be the fifth presenter in a very long review meeting where many other topics would be discussed, and as I mentioned, this product just didn’t matter to the executives. My charts were going to get glossed over. If the executives gave me 10 seconds of attention on each chart, I’d have considered myself lucky. The way the situation was shaping up, I felt pretty sure that this product line’s issues weren’t going to be addressed as a result of my presentation.

So instead, I showed charts like this, with actual photographs of actual customers and their actual quotes:


The same chart made more memorable by the addition of a human’s face and their own words

The result was palpable.

They leaned in. They looked at the faces in the pictures. Actual customers. People that looked like their sons, their daughters, their mothers. They chuckled at the funny social media handles. They cared. For the first time in a long time, they actually cared about the razor. And they cared about the fact that customers just weren’t loving it.

A few weeks later, I received an email that the go-ahead had been given to resolve a number of problems with this product line, including the missing green SKU in Configuration B. The VP thanked me for showing the “human side” of the data in my presentation.

When the time came to make the decision, they opted to fund a product they didn’t used to care about, thanks to charts they couldn’t forget.

Memorable or Comprehensible, or Both?

Stephen Few made the statement that comprehensibility matters, but memorability doesn’t when it comes to designing data visualizations. Well the original charts in my real-life example above were definitely comprehensible. I changed them because they weren’t particularly memorable.

My original charts were in the bottom right quadrant of the 4-blocker below, and all I did was push them up to the top-right. Sure, sometimes, it’s not necessary to do so. Sometimes, though, it’s make-or-break:


Note that for scenarios where the audience members already deeply care about the data, comprehensibility itself will result in memorability. Adding photos of beautiful, smiling faces just isn’t necessary.

But let’s be honest. Having an audience of 100% of the key decision makers that wait with bated breath for our next bland chart that results in a blank check being given right there on the spot just isn’t normal. It would be nice, sure, but how many times have you actually been in that situation? So many times you absolutely need them to remember your message. Having charts that draw them in and stay in their brains just isn’t a bad idea.

Sometimes There’s Just No Decision

So far I’ve written about data visualizations in the context of human decision-making. But many data visualizations don’t inform decisions at all. Decision support is but one of many possible purposes. Data visualizations can be created to merely inform, to educate, and yes, even to entertain. In those cases, design for memorability can be the difference between having someone share your work with others, and having them forget they ever saw it.

Few made the following comment about adding images to visualizations:

If I incorporate an image of a kitten into a data visualization, I can guarantee that a test subject would remember seeing that kitten if it is shown to her again a few minutes later. But how is that useful? Unless the visualization’s message is that kittens are cute and fun, nothing of consequence has been achieved. – Stephen Few

He answers the question himself quite well: images are useful if the visualization’s message is enhanced by the presence of the images.

Take my Edgar Allen Poe timeline for example:

Does the image of Poe add any value at all? How about the image of his signature? Are these components nothing more than “chartjunk” (Few mentioned to me in an email that he would not call the image of Poe “chartjunk” based on his 2011 writings on the subject), or do they actually perform a function?

I submit that they perform a vital function. The visualization shows the life works of one man as blocks stacked together in the years they were written. Works that were written on ink and paper by his own hand.

There’s no decision here. The visualization is simply intended to educate you. And it’s my opinion that your education takes on a whole different meaning – a whole different feeling – when you see Poe’s face and an artifact of his own penmanship.

And let’s be honest, the following version is pretty damn boring, you’d probable ignore it if you saw it in your twitter feed, and it’s not nearly as memorable, is it?

I’d like to conclude by quoting from Stephen Few’s critique one final time:

The greatest tragedy of this research is that what makes a visualization memorable is actually of no consequence. – Stephen Few

I hope I’ve made it clear in this blog post why I think that memorability can actually be of great consequence in data visualization. But did you notice that in my comments above I used phrases like “in my experience” quite often, and that all I really did was relate an anecdote and state my opinion? My opinion does not amount to codified knowledge, and my experiences do not amount to rigorous research.

And that’s exactly why I would appreciate further attempts by academics to study what makes charts more or less memorable. I’m sure this task isn’t easy. Visualizations are but one piece of an overall message that can be delivered in myriad ways to a variety of audiences. For those who are studying this topic, do know that there are practitioners out there who are hoping that the insight you glean into this topic can help us all.


For Fun: Happy Fibonacci Day!

2015 November 23
by Ben Jones

It seems there’s a day for everything, right. National Cashew Day (you guessed it, that’s today), World Philosophy Day (nope, sorry – last Thursday). Heck, you can even register your own National Day of [fill in the blank].

So thanks to MIT’s twitter account, I became aware that today is Fibonacci Day. Makes sense – November 23rd is 11/23, which are the first four numbers in the famous Fibonacci Sequence.

So, to earn my Math Nerd Card for 2015, I created the following dashboard that visualizes the first 100 numbers in the Fibonacci Sequence, starting with 1 instead of 0, as I’m led to understand is the more modern convention:

Math + Data Nerds Unite: know of any other good math vizzes out there? Leave a comment!

Thanks for humoring me,

The Backlash Against Data Dogmatism

2015 November 21
tags: ,
by Ben Jones

Have you noticed it as well? The tide is turning against dogmatism in data visualization, as witnessed by the increasing number of voices speaking out against a rigid approach and closed-mindedness regarding practices that are often ridiculed in knee-jerk fashion. It’s about time.

To which voices am I referring, and which data visualization practices are they defending?

1. Charts with non-zero y-axes

Can you tell Vox’s Johnny Harris and Matthew Yglesias have had it with readers taking pot shots about their choice of y-axis starting points? Their well-reasoned video entitled “Shut up about the y-axis. It shouldn’t always start at zero” says it all:

Harris and Yglesias show that the choice of axis starting point depends on the context, the unit of measure, and on the comparison being made.

2. Artistic approaches to data visualization

In Andy Cotgreave’s recent ComputerWorld article “Why Do We Visualise Data?“, Andy argues that not all data visualizations need to be burdened with the requirement of imparting ultimate precision of comparison. It depends on the purpose.

“The purpose of a visualization will also determine the extent to which you should inform effectively…Sometimes it’s more important to make someone engage with the overall message rather than the minutiae.”

Andy uses the example of Stephanie Prosavec’s “Air Transformed“: a wearable data visualization necklace showing air quality in Sheffield, UK:


Should this project really be ridiculed as “ineffective” and horribly contrary to “best practices”, with no place for it at all in the field of data visualization? Or could it be that this form of expression engages human beings in a way that a rigorous report complete with bar charts and compliant zero y-axis timelines of air quality in Sheffield could hardly do? Maybe both the report and the necklace are valid, each in their own way.

Which should you create? It depends on your purpose and your audience.

3. Pie charts

Oh, the poor, maligned pie chart. The chart type that gets pushed around and bullied on the data viz playground more than any other. Randal Olsen of /r/dataisbeautiful ran a twitter poll asking “Do you think pie charts should be banned from #dataviz?”. Scientific or not, nearly 2 in 5 responded affirmatively:

That’s amazing if you stop and think about it. Almost 40% of respondents, likely mostly data viz enthusiasts who follow Olsen, think that pie charts should never, ever, ever be used. Hilariously, Andy Kirk of Visualising Data asked whether we should also run a poll about whether those people should be banned, and Irvin Almonte’s response was sheer genius:

Again: It depends. Say it with me: IT DEPENDS.

Pie charts, in certain instances, can actually be more effective than bar charts at showing specific part-to-whole comparisons. And if the part-to-whole relationship is far more important to your message than comparing uber-accurately between categories, and if there are a very small number of slices, go ahead, give thought to using a pie chart. Don’t be intimidated by pie chart haters. There, I said it.

4. Word Clouds

I entered the fray last week with my blog post “My 3 Basic Tenets of Data Visualization” in which I argued that rules of thumb, not black-and-white rules, should prevail, along with a spirit of humility and openness to exploration and innovation in data viz.

I also did the unthinkable: I defended the Word Cloud. The poor, lowly pal of the pie chart, united on the playground in mutual fear of the roving data-dogma bully. My point is that if you only had a very short amount of time to impress upon a large room of people the most commonly used passwords, which of these four visualization types would you choose?

The word cloud sacrifices precision for completeness (all of the passwords actually appear on the screen in only the word cloud) and readability (the most commonly used passwords almost shout out at the reader). Is that a reasonable trade-off to make? Maybe. It depends.

Since that blog post, “What are your most used words on Facebook” has gone viral, and we’ve been inundated with over 16 million word clouds as of the writing of this blog post. Of course one can only hope this app is not a gigantic phishing scam, but do you think a bar chart version of your most commonly used words, or a concise and thorough text analytics report would have also gone viral? Maybe, but probably not. And I’m not saying “going viral” is a end that justifies all means, but in this case, ultimate precision of comparison is probably not needed anyway. It actually worked a little too well.

Let’s not let the pendulum swing too far, though.

This casting-off of suffocating restraint and a fearful spirit of ridicule is a REALLY GOOD THING in data visualization, but let’s not let the pendulum swing too far the other way. It’s true that pie charts are very often the wrong choice, and the majority of the time a y-axis that starts at zero is a really good choice.

I’m not a big fan of the word “best” in “best practices”, which seems to promise some optimal solution, but I do like the sentiment in this response by Vance Fitzgerald to my question on twitter about rules of thumb in data visualization:

I’m hopeful that the next phase of data visualization is one that embraces the gray of “it depends” and encourages open dialogue and constructive criticism. In order to get there, we’ll definitely have to shed dogma. Let’s absolutely do so, but let’s also carry forward the principles and rules of thumb that just make good sense, while being open to the possibility that breaking those rules might be a great idea in specific situations.

Wouldn’t this be a more mature approach? Wouldn’t it also be more welcoming, and more enjoyable?

Thanks for reading,

3 Fun Vizzes, 2 Useful Tableau Tips and 1 Dangerous One

2015 November 16
by Ben Jones

I believe in the creative power of play in any discipline. Data visualization is no different. I’ve acquired new skills and grown in ways I would have never predicted, all because I spent a little bit of time playing with a data set I found intriguing. Here are a three recent side projects of mine, and a useful tip that comes from each.

1. Play specific YouTube video segments when users click on corresponding marks

I’m pretty sure most of you were aware that it was (the real) “Back to the Future Day” this past October 21st. To celebrate the occasion, as any data-loving 80s kid would want to do, I created a dashboard using what I call dimension line charts to show all 13 trips the DeLorean time machine took over the course of the trilogy.

The dashboard gives the reader the ablity to watch the movie clip associated with each DeLorean trip by clicking on the corresponding arrows. To add this feature, I first had to find a video on YouTube that includes each time travel clip back-to-back. Then, I had to grab the embed URL for the video:


Finally, I had to add the embed URL with the following parameters to each row in the spreadsheet I created:

Where XXX and YYY are the start and stop times in seconds, respectively. The “&autoplay=1″ parameter means the user doesn’t have to click the arrow and then click play. Clicking the arrow automatically starts the video clip.

Great Scott, that’s a cool tip!

2. Use the TODAY() function to have a countdown clock in your dashboard update every day

The genesis for this next project came from eyeo. I had the chance to present at the eyeo conference last year, and the day of my presentation happened to be both a Friday the 13th and a full moon. Spooky, right? I wondered how often these two events coincided, so I found a calendar listing of each one, and figured out all the instances in which they occurred or will occur on the exact same day:

The helpful tip that comes from this side project is the use of the function TODAY(). To create the “Days Till Next” table on the far right hand side of the dashboard, I first created the following three calculated fields, the first to pull in the date value for “today”, the second to compute the number of days between today and each given event, and the third to null out events that have already happened in the past:


Next, I created a new Sheet and placed MIN(DaysTillNext) on the Text shelf, like so:

What’s great about this technique is that every day you load the dashboard, Tableau Public server will update all of these calculated fields, including the day that TODAY() maps to, and give you a brand new countdown. Who says Tableau Public dashboards don’t update automatically?

3. Invert the y-axis to stack marks downward — if you dare!

The third side-project was undertaken on the 166th anniversary of Edgar Allen Poe’s mysterious death. It was born out of a simple question: How many works of literature did the revered writer and poet produce over the course of his life? The following dashboard stacks 150 red boxes, one beneath the other, over the course of two and a half decades to visualize his prolific career:

WARNING: This is a dangerous technique! Just ask Christine Chan, who’s “Gun Deaths in Florida” viz, which used a similar technique, drew the widespread and harsh ire of the internet. People called her graphic “misleading” and “deceptive” for making something that was getting worse look to many like it was getting better.

Fair enough. I feel that in this case, the stacked boxes in the Poe viz isn’t as likely to be misinterpreted as a line chart which slopes “downward” from one value to a greater value. We’re just stacking boxes here, and it’s pretty clear that 1845 has “more” boxes than 1844, not less. By why do it at all? Why risk misleading with the inverted y-axis? It’s for dramatic effect – it makes it appear like blood is falling down onto Poe’s head. Dramatic effect is a double-edged sword, though, so tread with caution.

And to open the Pandora’s Box on this technique, simply right click on the axis you want to reverse (it can be the x-axis or the y-axis), and check the “Reversed” box:


Thanks, I hope you enjoy these side projects as much as I did. I hope you find the tips helpful (and that you don’t burn yourself on them!). What side projects have you learned from lately?