Earlier this week I had the pleasure and honor of presenting after Giorgia Lupi of Accurat at Data Visualization New York. My presentation focused on how to use Tableau as a data discovery tool, and luckily for me, the amount of data about New York is as abundant as everything else about the city. There was no shortage of material, from garbage to graffiti to rat sightings and electric consumption. New York hiccups, and it gets recorded.
Sharing data on the web with Tableau Public is both my job and my hobby, but this presentation allowed me to demonstrate how quickly Tableau allows users to find insights in data. Data discovery is a very important part of the overall process, which I conceptualized as a horse race track:
I made the analogy that using Tableau is like riding Secretariat – you get the distinct advantage of being able to race around the track a rapid rate, transitioning between the phases and quickly identifying patterns, outliers and trends in your data.
I also made a somewhat philosophical point that data is only one type of input in the overall learning process. Using data has its benefits and limitations. A benefit is that you can obtain valuable “explicit knowledge” – who, what, when and where? A limitation is that it’s often difficult to answer “why?” and “how?” using only data. Consider riding a bike: what’s a better way to learn, reading about it or doing it? And consider New York: no matter how many charts you see about the city, nothing replaces the unique experience of walking its streets and riding its subways. Tacit knowledge. Often the best outcome of data discovery is that you know what questions to ask in the analog world.
Here is a diagram showing the overall learning process, and how data fits in as a specific type of input:
As I mentioned, there was a wealth of data to explore and visualize about New York. I explored a number of those data sets, and here are a few of the projects I recreated during the 1 hour time slot I was given (focus was on learning, not fit & finish).
Click to open an interactive version:
2. “Know where” – The Bridges of NY & NJ (get the data here):
It was an amazing and unique experience for me. I had a lot of fun presenting (not shown here is the Homer Simpson bologna viz I created in response to Accurat’s project “A Slice for Everyone“), and I met a number of fellow visualization enthusiasts, including Naomi Robbins. Naomi was gracious enough to sign a copy of her newly reprinted Creating More Effective Graphs, which I am currently reading and hope to review soon.
Thanks for stopping by,
I spent the past few days with some colleagues in San Antonio at the Investigative Reporters and Editors (IRE) Conference training journalists to use Tableau and meeting others interested in data visualization and data journalism.
I also had the opportunity to participate in a panel discussion with Ryan Murphy of the Texas Tribune, Steve Thompson of the Dallas Morning News and Matt Waite of the University of Nebraska-Lincoln.
My core message is that there are different tools for different situations. I’m a big believer that static charts made with Excel and R have their place, and so do intricate and elegant coded visualization masterpieces. I also see Tableau as a unique tool that affords much of the interactivity with a user interface that is easy to learn. A bowl of porridge that’s often “just right”.
Here are my slides from the presentation. As always, these are my nascent thoughts, and I’d love to get your take:
Here are hyperlinks to the visualizations and websites included in the presentation:
As always, thanks for stopping by,
There are a number of interesting data visualization contests going on right now that I thought I’d bring your attention to. There are contests for people with different interests: volunteering, healthcare, online transparency and neuroscience. Prizes are fairly generous this time around – trips to conferences, cash, products and software. Here goes:
1. Tableau’s Civic Data Viz Contest
- Challenge: “use Tableau Public to visualize data about your community” (global)
- Data: links to various sources provided here
- Winner gets: 3-day trip to TCC13 in Washington D.C. (crowd favorite wins $500)
- Deadline: June 24, 2013
- Challenge: “We challenge you to visualize the removal requests in Google’s Transparency Report”
- Data: Google Transparency Report
- Winner gets: $3,250 (2nd place gets $1,250, 3rd place gets $500)
- Deadline: June 27, 2013, 11:59 pm EDT
- Challenge: “turn the raw data of ‘civic health’ into useful applications and visualizations that have direct impact on public decision-making” (U.S)
- Data: files available for download here
- Winner gets: “2013 Prizes will be announced throughout the Challenge” (check the Prizes site here)
- Deadline: July 28, 2013, 11:59 pm EDT
- Challenge: “help identify spatial and temporal gene expression patterns in the developing mouse brain”
- Data: Download here
- Winner gets: Free full registration for the VIZ 2013 Conference
- Deadline: July 31, 2013
5. Visualizing.org’s Visualizing Hospital Price Data
- Challenge: “We challenge you to visualize the hospital price data to create greater transparency in the healthcare field”
- Data: Medicare Provide Charge Data for Inpatient and Outpatient
- Winner gets: $9,000 (interactive) and $6,000 (static) – see website for full list of prizes totaling $30,000
- Deadline: August 25, 2013, 11:59 pm EDT
Best of luck! Let me know if I missed any.
(For this post, I owe a word of thanks to Andrew Beers – VP of Product Development at Tableau, for the raw data, and Mike Klaczynski – Data Analyst on the Tableau Public team, for showing me this method).
At the Seattle Hacks/Hackers event last night, we built an interactive data dashboard that allows the reader to explore bridges in the state of Washington, where a bridge crossing the Skagit River recently collapsed into the water after being struck by a truck carrying an oversize load.
What’s notable about this dashboard is that you can click on any of the 2,489 circles on the map and bring up an embedded Google satellite image of the bridge within the dashboard itself. I didn’t have to take a screen shot of each satellite image – that would be way too much fun. Instead, I used a little-known feature in Tableau Public – embedded web pages (similar to the Embedding YouTube post from a few weeks ago).
How to embed a Google satellite image in a Tableau Public visualization:
The first group of 5 steps shows you how to create a url for each bridge, and the second group of 5 steps shows you how to add a box to your dashboard to pull up the bridges.
I. Create the URLs
1. First, notice that the data file contains Latitude (“LAT”) & Longitude (“LON”) for each bridge.
2. A Google Maps search for a particular Latitude & Longitude (say, 48.445781 and -122.341108) yields a link url like this:
3. The url can be simplified a little bit as follows:
4. Breaking down the elements of the url, we can see that after the latitude & longitude, there are three parameters in the url:
- “q=48.445781,-122.341108″ – these are your coordinates. Note that if you have an address field instead of Lat/Long, you can put an address after “q=” as well
- &z=17 – this specifies the zoom level. Higher numbers zoom in, lower numbers zoom out
- &t=h – this specifies the type of map. (t=m is a map, t=h is a satellite view)
- &output=embed – this is a key parameter that makes sure the website you embed in your viz doesn’t include the entire site – just the map itself
5. You could then generalize the url to:
You can see that the actual numbers for Latitude and Longitude have been replaced with field names <LAT> and <LON>
The next group of steps walks you through how to add a box to your dashboard that pulls up this embedded satellite image when a user clicks on a particular circle.
II. Add dynamic Satellite Images to your Dashboard
1. First, in the dashboard tab, drag a Web Page onto your dashboard from the left-center panel (just leave the “Edit URL” dialog box blank and click “OK” for now):
2. From the Dashboard file menu, click Actions and click the “Add Action >” button and choose “URL…”
3. In the Add URL Action dialog box, select whatever sheet you have created that includes the fields LAT and LON, and choose what event you’d like to trigger navigation to the new image (Hover, Select, or Menu). In this case, I’ve selected Map as my Source Sheet and Select as my trigger event, but you could trigger the action from a table or other type of sheet. Here’s what the dialog box looks like:
4. Now comes the magic. Copy and paste the generalized url above to the URL field of the dialog box, and replace <LAT> and <LON> with the corresponding field names in your data source by clicking the small arrow to the right of the URL text entry field:
5. That’s it! Test it out by clicking on the map circles and see the satellite image change accordingly.
I can see this being useful for organizations that would like to include images of office locations or real estate assets in their dashboards. For data journalists, it’s about allowing readers to interact with the abstract and the real in the same graphic.
If you make a dashboard with a dynamic Google map, be sure to post the link in the comments field for all to see.
Added 6/10: Here are the slides from the event:
Being a Canadian (eh) living south of the border, I’ve watched the US political process as an outsider looking in for all of my adult life. It’s a fascinating system, with plenty of fine points and flaws, which just means it’s a human system.
I had the chance to visit and present data visualization using Tableau at the TechActivist conference this weekend. I learned a lot about how people deeply involved in the political machinery of this country think, relate to each other and approach their goals. There is no doubt that they all see data as a huge opportunity going forward.
To prepare for the conference, I was given Washington State election results from 2012. My colleagues at Tableau Mike Klaczynski, Jewel Loree and I spent some time playing with the data and mashing it up with census data to see if we could find anything interesting in the results. We presented a number of findings, and created this voting results analytics dashboard at a county level.
Click to see an interactive version, use the drop-down in the upper right to switch between Republican and Democratic perspectives:
Here are the slides I presented based on my cursory research into the subject of data visualization and US politics. I don’t claim to be an expert in politics, but I did find some interesting articles and visualizations that I felt compelled to share:
As always, feel free to leave comments, feedback, suggestions, etc. If you really want to get my attention, go get Tableau Public (it’s free), download the workbook (click “Download” in the bottom right corner of the dashboard) and remix the data to show it the way you’d like to see it.
Lastly, here’s a link to many other election day visualizations created by the aforementioned Mike Klaczynski.
Thanks for stopping by,
F. Scott Fitzgerald, author of The Great Gatsby and many other works of classic American literature, kept a fairly complete (though not always arithmetically accurate) ledger of the earnings he collected by title from the time he left the army until 1936, just a few years before his death. You can see the ledger at the University of South Carolina’s digital collections website here.
I was able to convert the record of the dollars he actually made to 2012 dollars using, appropriately, a website called “Westegg“. West Egg is the setting for the novel The Great Gatsby. I found that he made over $37K in 1931, or approximately $564K in today’s dollars. Not too bad. Of course the movie The Great Gatsby will likely net much, much more than Fitzgerald’s tally, but he wasn’t exactly a starving artist.
Here are his earnings visualized in three tabs: one showing a history over time, another showing the ledger for each year, and finally a third showing the top titles in terms of income collected by Fitzgerald:
A few notes on the making of this interactive graphic:
- Converting the data from pdf to spreadsheet form was painstaking work. I tried a few methods – pdftoexcel.com, saving the pdf to txt and then importing to Excel. These methods really didn’t work out too well. In the end, it was copy-paste from pdf to Excel, rearrange the fields to a raw data table, and then double/triple check the figures.
- Cross-checking the tallies, I found that many years Fitzgerald wasn’t as good at math as he was at writing fiction. No big surprise there, I suppose. It was funny to see that in one place, he actually blames a bout of bad health with his arithmetic errors.
- Categorizing the titles was a little tricky. For example, he used a category called “Books” in some years, and “From Books” in other years. I gave it my best shot to combine these where it seemed like it made sense, but his system would probably cause most accountants a good deal of heartburn. You will find “The Beautiful and Damned” in no less than four different categories – “Books”, “Movies”, “English Rights”, and “Miscellaneous”. This is the way Fitzgerald cataloged the income, so I tried to keep it as true to his record as possible.
Thanks for stopping by! If you’re going to see the movie this weekend, I hope it’s better than what the critics say,
Shan Carter, Kevin Quealy and Joe Ward of The New York Times recently published a thorough analysis of the rise of strikeouts in Major League Baseball. In it, they showed how the number of strikeouts per game has risen along with the number of pitchers per game using two line plots, one for each variable. It’s good stuff, you should read it. I especially like the grayed out dots for each team, which give a sense of the team-by-team variation without overwhelming the reader.
I found the summary table for average MLB game stats since 1871 here, and I wondered what this correlation, and other pairings of MLB stats, would look like if they were plotted as connected scatterplots. Connected scatterplots are a visualization form that have have been featured at NYT recently (more about this form of visualization, including a number of examples, in Alberto Cairo‘s blog post “In praise of connected scatterplots“).
Here’s what it looks like, along with a second method show below it, the dual axis line plot:
Effort and Reward
I struggled with connected scatterplots at first. Maybe the engineer in me stubbornly resisted the notion of including time on anything other than the x-axis. But I found that after investing a small but not insignificant amount of time in orienting myself to the axes, the connected scatterplot actually became a fun chart to explore. To quote Andy Kirk, my effort was “ultimately rewarded with a worthy amount of insight gained.” (Kirk, Data Visualization – a successful design process, p26).
The connected scatterplot imparts a sense of travelling a pathway through a terrain that has twists and turns, loops and sudden rises and falls that encode how the two different variables changed together. It’s a roller coaster ride of sorts, and once you’ve on-boarded the cipher of the code, you’re out of the turnstiles and on your way.
The Other Method: Dual Axes
You have to admit, though, the dual axis line plots below the connected scatterplot do a fine job as well. In fact, they probably require the reader to invest less time upfront to begin to glean some insight (sorry, no experimental data on that claim). If my feeling is right, it probably has something to do with the fact that we’re more used to seeing changes over time shown from left-to-right. It’s still an abstract way to represent time, it’s just one we’re more familiar with.
Virtues and Vices
The dual axis method has some distinct advantages: if you open up the year slider to show the entire range from 1871 to 2012, you will see what I mean. The connected scatterplot becomes much more difficult to read, but the dual axis line plot does not require any additional effort. You can adjust the slide in the interactive version above, or here’s a screen shot:
Additionally, not all pairs of variables render well in the connected scatterplot format, even with the shorter time window of 1981-2012. If one variable basically contains a bunch of random noise, or doesn’t change much at all, the connected scatterplot will look very jumbled, and will be hard to read since all the points will just form clumps. For example, change variable 1 to “Avg Pitcher Age” and change variable 2 to “Batters Faced”. What you get isn’t an exciting journey, it’s a wild goose chase, and you can see why if you take a look at the dual axis plot, which immediately tells the story – two flat lines:
In conclusion, my opinion at this point is that the connected scatterplot is a special case visualization type for showing how two variables change together over time – if it works well, it really works well. If it doesn’t, ditch it for the more all-purpose (and admittedly more utilitarian) dual axis line plot. I guess to go along with the baseball theme, my advice would be to swing for the fences if the pitch is right, otherwise just make contact and get on base.
How I made the connected scatterplot in Tableau
This section will serve as a very brief how-to for making a connected scatterplot in Tableau Public. The key is dragging “Year” to the “Path” mark landing pad.
Here are the steps:
- Drag the first measure you want to use to the “Columns” shelf and the second to the “Rows” shelf
- Convert both from SUMs to Dimensions by clicking in the down arrow of the pills and selecting “Dimension” (now you have a basic scatterplot)
- Change the Marks type from “Automatic” to “Line” and drag “Year” to the Path landing pad
- Also drag Year to Label
Of course I used Parameters to allow the reader to control the two variable types, and I also used a dual axis to format the data points but the above steps do the trick.
Here’s a screen shot of the final connected scatterplot sheet that I used:
Thanks for stopping by, and I’d love to know your thoughts on the virtures and/or vices of connected scatterplots,
I had the pleasure of presenting a data visualization workflow to the Boston Predictive Analytics Group with Tanya Cashorali last night (huge thanks to Bocoup for providing the meeting space and John Verostek for organizing the event).
The workflow we presented involves using a library called PitchRX in R to scrape pitch data from the PitchFX database, which Tanya covered (she also has a write-up on her website sportsdataviz.com), and then connecting Tableau Public to the data set to see what’s going on. We mined and visualized pitches by Jonathan Papelbon from the 2008 through the 2012 MLB season, or around 6,000 pitches in total.
Here’s the dashboard I put together for the group:
This is a first cut at what an exploratory (as opposed to explanatory) dashboard could look like, and I’m not quite sure what all the stories in the data are yet, but here are some tidbits that popped out to me:
- If you want a good chuckle, select just the pitches that resulted in balls and check out the gift Brandon Boggs was handed by the umpire on the 0-1 fastball of his at-bat during the 9th inning. That’s the nature of the game, I suppose.
- Also, filter to just strikes and ponder what Sean Rodriguez was thinking when he swung at the 0-2 pitch in the dirt during the 16th inning. Maybe I’d swing at just about anything in the 16th inning too, so I should be careful to criticize.
- Next, I was surprised to see that Papelbon actually threw more pitches to left handed batters (54% of all pitches) over the course of the past 5 seasons. Really, more lefties?
- Lastly, sliders are almost exclusively thrown to right handed batters (81% of all sliders were pitched to righties). That’s a good insight for the scouting report, I’d imagine. I’m guessing baseball geeks will be able to find a ton more here.
The real point here is that there’s room for multiple tools in every data worker’s toolkit. Tanya and I showed how you can combine different tools in a complementary way to get the best results. In this case, R does all the plumbing, and Tableau handles the fixtures and window dressing.
Thanks for stopping by, let me know if you have any feedback about the dashboard, or if you’d like to see the how-to.
…and the only prescription is more data.
Check out these regularly published data and data visualization features and roundups. You’ll feel better in no time.
If you know of other data pills to take, leave a comment!
Today (April 13th) marks the 50th birthday of the 13th World Chess champion, Garry Kasparov of Russia. Garry wrote a book called “How Life Imitates Chess” that I highly recommend – in it he gives a window into his upbringing and his professional life as chess player, and how he not only became the youngest undisputed World Chess Champion in 1985 at age 22, but how he maintained the world #1 ranking for 255 months. It’s a great read because he talks about how his approach to dominating the chess world can transfer into other arenas of life, including his present struggle as political activist advocating for true democracy and human rights as chairman of the United Civil Front in Russia.
Here are some of my favorite quotes from the book:
“The virtue of innovation only rarely compensates for the vice of inadequacy.”
“We must all walk a fine line between flexibility and consistency. A strategist must have faith in his strategy and the courage to follow it through and still be open-minded enough to realize when a change of course is required.”
“Questioning yourself must become a habit, one strong enough to surmount the obstacles of overconfidence and dejection.”
When I read this book, I was struck by how similar playing chess is to visualizing data. Both are activities that present us with a myriad of options, strategies, and tactics – some more well-advised than others. There is a highly experiential aspect, where the more one participates in the activity, the more one has a sense of what will work well in a given situation – a way of narrowing the option space.
I wrote more about my thoughts about the similarities in a blog post called “How Data Visualization is Like Chess“. It was the most enjoyable to write, by far. One of my most viewed data visualizations is “The Best Chess Openings“, so I know I’m not the only data viz enthusiast who also likes chess.