Those of use who work with data call ourselves many things these days. From the typical and perhaps mundane “data analyst” to the controversial and supposedly sexy “data scientist”, titles abound. Titles aside, though, let’s just call us all “data workers” for a moment – we’re all people who use quantitative information found in spreadsheets and databases to steer our businesses, our communities, and ourselves into the future. Some may use code and sophisticated algorithms to do that, others may use off-the-shelf software and fairly basic analytics techniques. Others use all of the above, and more. We’re living in exciting times of growth and development for many people.
But this post isn’t about titles, or anything like that. You may or may not like me using the term “data worker”, and I’m okay with that. I’m not married to it, and as long as you understand what’s intended by that description, then let’s go with it for now. Because this post is about the biggest challenges that data workers face today. What are these big challenges, and what are we doing about them? I got thinking about this after I saw a tweet from someone named Shane Morris this morning:
I think the biggest problem facing #DataViz today is that so few can think past the capabilities of their tool of choice. Back then their creativity was the only constraint.
— Shane Morris (@ShaneKnowsData) May 21, 2018
Now I don’t believe I’ve met Shane in person before, and I don’t disagree that he has identified a major issue in tool-based limitations to our creativity. But I did get to wondering whether or not this is the “biggest” challenge, and what the other “big” challenges there are, if I were to try to list them. So here’s what came to mind. I’d love to know your thoughts on these, and which other big challenges you have on your list.
What are the biggest challenges?
Challenge #1: Tool-based Limits and Silos
I’ll agree with Shane on the identification of a major challenge, and raise him one sub-challenge: it isn’t just that people are limited by what a particular tool let’s them do, it’s that often times their network is limited to others that use that same tool. So tools don’t just limit what we can create, they often times limit with whom we connect. I’ve been working for Tableau Software for over half a decade now, and Tableau has a very active and passionate community, both online and in person. These enthusiasts share effective practices, support each other in very practical ways, and push each other to improve. All of that is a very good thing, and I’ve benefitted from it in many ways over the years. We should keep doing that.
But there’s no need to make it exclusive. It’s human nature to gravitate to groups where we feel we belong, and in this sense the landscape of data workers is no different from OS users who cluster around Mac, PC or Linux, or sports fans who go to the same bars where people wearing the same jerseys will high five after certain plays but not others. We can’t get rid of that part of who we are, and I’m not saying we need to.
But I do believe that we’d be better off as a whole if software providers, conference planners and meetup group organizers did more to encourage and even facilitate connection to other similar groups. I don’t mean this in some sort of mushy “why don’t we all just get along” kind of way. I mean that much could be learned from conversations between people who solve similar problems with different tools. That has been my experience as co-chair of the Tapestry Data Storytelling Conference (of which more, soon). I’ve been able to meet very talented people who have spent time learning tools and methods that have different strengths and weaknesses as the ones I’ve learned.
There are other such connections happening more and more. For example, I admire what Wes McKinney and Hadley Wickham are doing joining the R and Python worlds with the new venture Ursa Labs, and feel that more initiatives and groups could be formed along these lines.
After all, what science fiction author envisioned a future where we are divided based on the languages we speak to computers?
Challenge #2: Lack of Widespread Data Literacy
If Challenge #1 deals with people in the data worker space, Challenge #2 deals with those who aren’t yet in that space. I think there are a whole lot of them. Those of us over, say, 30 grew up in a world that didn’t really have distinct “data” programs in colleges, and high schools taught us much more calculus than statistics or analytics. Add that to the fact that numerical competencies can be challenging to develop for many, and even tricky for experts to consistently get right, and you have a situation where the majority of people just don’t speak the language of data very well yet.
The “data illiterati” can be divided into those who aren’t aware of their illiteracy, and those who are aware of it. Those who are aware of it can be further divided into those who want to change, and those that don’t. Those that want to change either feel that they can or they feel that they can’t. Those that don’t want to change feel that it’s just not necessary. Based on my anecdotal experience, This last group is shrinking.
The good news is that the group that wants to learn data and feels that they can do it have a ton of alternatives available to them these days. Universities are coming out with data programs left and right, online sites like Udemy and Coursera let you learn via DIY, and tool companies have tutorials that are incredible as compared to only a few years ago.
But the thing that concerns me is the group that feels like they need to learn data, but they feel like they can’t for some reason. I think this is a huge group of people. Either they feel blocked by some perceived innate deficiency (“I’m just not good at math”), or they don’t know where to turn. I’m not sure the alternatives in the previous paragraph do the trick for them. So something more is needed to address this challenge. We’ll have to see what comes next.
Challenge #3: Poor Adherence to Standards of Data Ethics
This is a major issue in today’s data space, because it gets to the “why” behind what we’re doing. What good is it to have high competency and skill working with data if what we’re doing with those capabilities isn’t even ethical? Can we agree on what is and what isn’t ethical in this space? What’s stopping companies from doing what they want with our data in order to achieve their goals, unchecked?
A code of ethics for data workers is needed. Something similar to the Hypocratic Oath for doctors, or Asimov’s Three Laws of Robotics. Something that actually limits what we undertake, and how we go about our profession.
We don’t just need a bunch of fancy words that everyone agrees to but that in effect are meaningless. It’s easy for ethics to become that. We need something substantial that stands in the way of companies and governments using data in inappropriate ways. We need a “Three Laws of Data Ethics”, or something similar to that (Evernote took a stab back in 2011).
One other group I was a part of recently took a pass at this. In December 2017, the first Open Data Leadership Summit organized by the team at data.world brought together individuals from a variety of specialties and backgrounds, and we spent half of the event talking about this exact issue. What came out of that discussion was the Manifesto for Data Practices, and I recommend you take a look at it. Over 1,400 people have signed their name to this document to date – that’s a small but not insignificant number of people who feel that the values and principles laid out by this group would be helpful to adopt. Here are the principles:
Will this manifesto do the trick? Not all by itself. Acknowledging something as important isn’t the same as adhering to it in practice. So more is needed still. That’s why this remains a big challenge.
Challenge #4: Preservation and Conservation of Records and Insights
I’ll call this one the silent challenge, because I don’t think many are thinking about it. It’s highly important, but isn’t quite perceived as urgent by most. Important but not urgent challenges are the ones that can be the most difficult to solve, because there are so many other challenges that are screaming in our face to fix right now.
So what’s this challenge all about and why is it so big? Within our societies and our businesses, we’re amassing incredible amounts of data, and we’re leveraging that data for very powerful insights and publishing it in various ways and on various platforms. But will all of that survive for subsequent generations?
Everything we build eventually crumbles or gets replaced by something different. Will our data be around for others to see what life was like for us, or will our great-great grandchildren look back on the early 21st century with a whole lot of questions because the files got corrupted, the servers stopped working, the software was no longer supported, the media storage devices couldn’t be plugged in to anything, and there were no backups, no hard copies, no instructions left when the company got acquired or the standards changed or the lights went out.
Does that sound like a crazy thing to worry about, to you? If so, I really hope you’re right. Just don’t think about other technologies that have gone the way of the dinosaur, and how much information got lost when the transition left them behind. We can still buy floppy disk drives on Ebay, Atari 2600 consoles on OfferUp, and a whole host of dongles to convert from this type of connector to that, and there’s always the Wayback Machine to remind us about all those ugly GeoCities sites we made.
But even in the near term, data preservation can sometimes be a struggle. My father passed away a few years ago, and I was delighted to find some old cell phones in a recent move that I was sure contained voicemails from him. I had to dig deep to find some old-style iPhone connectors, and when I did I listened to each recording a few times. That was really nice for me.
But zoom ahead another 30 years, or 40, or 100. What are the odds I’ll be able to listen to those recordings? It’s the long term horizon that’s the concern. It’s more than a little ironic to think that our generation – the selfie generation, the one that posts pictures of every meal for the whole world to see, but prints out none of it – could be the one that future generations know the least about. If they know about us, it’s because people out there – the librarians and archivists – will have come up with redundant solutions to preserve and conserve.
So that’s what I have! 4 “Big Challenges” that I believe we face as data workers. Which of those four is the biggest? I don’t really know, and I don’t have a good way to rank them. Maybe you’re aware of a 5th or 6th that’s even bigger. What do you think?
Thanks for reading,