There are lots of data analysis and visualization tools out there, but the most commonly used (and misused) has to be Excel.  It’s certainly not the best, but in terms of sheer number of users, I have a hard time imagining any other tool with more. I’ve found that many data discovery efforts include Excel at some point or another in the process.

If you are interested in getting better at Excel, I highly recommend you follow Jon Peltier, Jorge Camoes and Chandoo. These three guys are the Excel pros, and much of what I have learned over the past few years has come from them. I haven’t featured any Excel dashboards on this blog thus far, but thanks to a recent contest at Chandoo.org, I found myself cranking out the old pivot tables and vlookup formulas to see what I could do with a provided data set of 1,883 responses to a recent salary survey.

So how did it go?

Well I have to admit, after a solid year of using mostly Tableau to explore data sets, staying in Excel felt a little like speaking a foreign language. Many times I found myself struggling to do something that would have been second nature in Tableau. I think a huge part of that is the difference in the tools themselves (Tableau is designed specifically for data discovery via visualization), but certainly part of it had to be the user as well – I just haven’t been spending that much time in Excel doing this type of activity lately.

And what was the end result?

Here is my final submission. You can download the Excel file here (692KB):

 

This survey exploration worksheet shows:

  • Summary statistics across the top
  • Top 5 and bottom 5 country lists, and
  • A two-country comparison tool in the middle
  • Column charts showing mean salary along with +/- 1 standard deviation error bars at the bottom with
  • Pivot table “slicers” which function as global filters for everything in the worksheet

The last bullet point – “slicers” is the most powerful part of the worksheet, and only available (I believe) in Excel 2010. Slicers are cool because they allow the user of the worksheet to drill down to any combination of response attributes. They’re basically like “Quick Filters” in Tableau. An added bell (or whistle if you prefer) is that the country tables include a flag to the left of the country name that changes to match the corresponding country when the slicer filters are applied. “Ooo, fancy!” Hey, thanks!

To try it out for yourself, just download the worksheet, open it with Excel 2010, and click on any of the colored pills in the section to the lower right under the “Global Filters” header (you can hold down Ctrl to select multiple attributes and press Alt + C or click the little icon in the upper right of the pill grouping to clear the filter).

What did I learn?

You have to be careful because many of the countries only have a single survey respondent, making average “mean”ingless (sorry). This often happens with survey response data – people make all kinds of conclusions based on very small sample sizes. While I went ahead and ranked the countries based on mean anyhow, I included the all-important “n” column in the table to flag the user. I stopped short of doing any tests for statistical significance, but this could have been done as well. I also added the “region” field, which wasn’t included in the original data set at all. I was able to obtain these region categories from the UN stats website. It sounds easy enough, but believe me, it was a pain (‘Vietnam’ vs. ‘Viet Nam’ is one example of many mismatches that had to be manually adjusted).

How to build it:

  • Summary statistics – these cells are pointed to the grand total row of a simple pivot table. Nothing fancy here. I added it because it really highlights the “big picture” for the user, before they get into the details below.
  • Top & Bottom 5 lists – same thing for the tables themselves, the cells are pointed to two different pivot tables, the first of which is sorted based on descending average salary (so the best are at the top), and the second of which is sorted based on ascending average salary (so the worst are at the top).
  • Two-country comparison – this section makes use of two data validation cells that allow the user to pick two countries from a drop-down list in a separate worksheet tab. The list is generated from the country name column of yet another pivot table.
  • Updating flags – This was the trickiest part to do. Its hard to explain, so I will refer you to this post or this video. Basically, you create a second table in a different worksheet with the country names and the respective flag images fitted to the adjacent cells, name the cells (Ctrl+Shift+F3), create special formulas (Ctrl+F3) using the =INDIRECT() function, and map each image in the worksheet to the right formula.
  • The four column charts in the lower section each have their own pivot table (there must be a more efficient way to do this – suggestions are welcome!) and the error bars are added using the ‘Layout > Error Bars > More Error Bars Options’ path and selecting the custom error bars option and pointing it to a pivot table with standard deviations for each of the four attributes (region, experience, job and usage). More on error bars at Jon’s site here.
  • Slicers: a great tutorial that Chandoo created to teach about slicers can be found here. This part really wasn’t that difficult, you just have to know to include slicers using the ‘Analyze > Include Slicer’ menu path, and to connect each slicer to all of the pivot tables in the document so that they act as global filters.

Now it’s your turn

What’s your take? What would you do differently with this specific project? How much do you use Excel?

Thanks for stopping by, as always!

Ben