Those of use who work with data call ourselves many things these days. From the typical and perhaps mundane “data analyst” to the controversial and supposedly sexy “data scientist”, titles abound. Titles aside, though, let’s just call us all “data workers” for a moment – we’re all people who use quantitative information found in spreadsheets and databases to steer our businesses, our communities, and ourselves into the future. Some may use code and sophisticated algorithms to do that, others may use off-the-shelf software and fairly basic analytics techniques. Others use all of the above, and more. We’re living in exciting times of growth and development for many people.
But this post isn’t about titles, or anything like that. You may or may not like me using the term “data worker”, and I’m okay with that. I’m not married to it, and as long as you understand what’s intended by that description, then let’s go with it for now. Because this post is about the biggest challenges that data workers face today. What are these big challenges, and what are we doing about them? I got thinking about this after I saw a tweet from someone named Shane Morris this morning:
I think the biggest problem facing #DataViz today is that so few can think past the capabilities of their tool of choice. Back then their creativity was the only constraint.
— Shane Morris (@ShaneKnowsData) May 21, 2018
Now I don’t believe I’ve met Shane in person before, and I don’t disagree that he has identified a major issue in tool-based limitations to our creativity. But I did get to wondering whether or not this is the “biggest” challenge, and what the other “big” challenges there are, if I were to try to list them. So here’s what came to mind. I’d love to know your thoughts on these, and which other big challenges you have on your list.
What are the biggest challenges?
Challenge #1: Tool-based Limits and Silos
I’ll agree with Shane on the identification of a major challenge, and raise him one sub-challenge: it isn’t just that people are limited by what a particular tool let’s them do, it’s that often times their network is limited to others that use that same tool. So tools don’t just limit what we can create, they often times limit with whom we connect. I’ve been working for Tableau Software for over half a decade now, and Tableau has a very active and passionate community, both online and in person. These enthusiasts share effective practices, support each other in very practical ways, and push each other to improve. All of that is a very good thing, and I’ve benefitted from it in many ways over the years. We should keep doing that.
But there’s no need to make it exclusive. It’s human nature to gravitate to groups where we feel we belong, and in this sense the landscape of data workers is no different from OS users who cluster around Mac, PC or Linux, or sports fans who go to the same bars where people wearing the same jerseys will high five after certain plays but not others. We can’t get rid of that part of who we are, and I’m not saying we need to.
But I do believe that we’d be better off as a whole if software providers, conference planners and meetup group organizers did more to encourage and even facilitate connection to other similar groups. I don’t mean this in some sort of mushy “why don’t we all just get along” kind of way. I mean that much could be learned from conversations between people who solve similar problems with different tools. That has been my experience as co-chair of the Tapestry Data Storytelling Conference (of which more, soon). I’ve been able to meet very talented people who have spent time learning tools and methods that have different strengths and weaknesses as the ones I’ve learned.
There are other such connections happening more and more. For example, I admire what Wes McKinney and Hadley Wickham are doing joining the R and Python worlds with the new venture Ursa Labs, and feel that more initiatives and groups could be formed along these lines.
After all, what science fiction author envisioned a future where we are divided based on the languages we speak to computers?
Challenge #2: Lack of Widespread Data Literacy
If Challenge #1 deals with people in the data worker space, Challenge #2 deals with those who aren’t yet in that space. I think there are a whole lot of them. Those of us over, say, 30 grew up in a world that didn’t really have distinct “data” programs in colleges, and high schools taught us much more calculus than statistics or analytics. Add that to the fact that numerical competencies can be challenging to develop for many, and even tricky for experts to consistently get right, and you have a situation where the majority of people just don’t speak the language of data very well yet.
The “data illiterati” can be divided into those who aren’t aware of their illiteracy, and those who are aware of it. Those who are aware of it can be further divided into those who want to change, and those that don’t. Those that want to change either feel that they can or they feel that they can’t. Those that don’t want to change feel that it’s just not necessary. Based on my anecdotal experience, This last group is shrinking.
The good news is that the group that wants to learn data and feels that they can do it have a ton of alternatives available to them these days. Universities are coming out with data programs left and right, online sites like Udemy and Coursera let you learn via DIY, and tool companies have tutorials that are incredible as compared to only a few years ago.
But the thing that concerns me is the group that feels like they need to learn data, but they feel like they can’t for some reason. I think this is a huge group of people. Either they feel blocked by some perceived innate deficiency (“I’m just not good at math”), or they don’t know where to turn. I’m not sure the alternatives in the previous paragraph do the trick for them. So something more is needed to address this challenge. We’ll have to see what comes next.
Challenge #3: Poor Adherence to Standards of Data Ethics
You can’t read the news today without coming across something about data privacy rights, companies misuse of personal data, and major breaches of data security on a daily basis. From Strava’s “god view” that revealed sensitive locations to Facebook’s Cambridge Analytica scandal to Europe’s recently enacted GDPR legislation, we are in a situation where every organization has a data privacy policy, but few of us feel that our personal data is actually safe.
This is a major issue in today’s data space, because it gets to the “why” behind what we’re doing. What good is it to have high competency and skill working with data if what we’re doing with those capabilities isn’t even ethical? Can we agree on what is and what isn’t ethical in this space? What’s stopping companies from doing what they want with our data in order to achieve their goals, unchecked?
A code of ethics for data workers is needed. Something similar to the Hypocratic Oath for doctors, or Asimov’s Three Laws of Robotics. Something that actually limits what we undertake, and how we go about our profession.
We don’t just need a bunch of fancy words that everyone agrees to but that in effect are meaningless. It’s easy for ethics to become that. We need something substantial that stands in the way of companies and governments using data in inappropriate ways. We need a “Three Laws of Data Ethics”, or something similar to that (Evernote took a stab back in 2011).
One other group I was a part of recently took a pass at this. In December 2017, the first Open Data Leadership Summit organized by the team at data.world brought together individuals from a variety of specialties and backgrounds, and we spent half of the event talking about this exact issue. What came out of that discussion was the Manifesto for Data Practices, and I recommend you take a look at it. Over 1,400 people have signed their name to this document to date – that’s a small but not insignificant number of people who feel that the values and principles laid out by this group would be helpful to adopt. Here are the principles:
Will this manifesto do the trick? Not all by itself. Acknowledging something as important isn’t the same as adhering to it in practice. So more is needed still. That’s why this remains a big challenge.
Challenge #4: Preservation and Conservation of Records and Insights
I’ll call this one the silent challenge, because I don’t think many are thinking about it. It’s highly important, but isn’t quite perceived as urgent by most. Important but not urgent challenges are the ones that can be the most difficult to solve, because there are so many other challenges that are screaming in our face to fix right now.
So what’s this challenge all about and why is it so big? Within our societies and our businesses, we’re amassing incredible amounts of data, and we’re leveraging that data for very powerful insights and publishing it in various ways and on various platforms. But will all of that survive for subsequent generations?
Everything we build eventually crumbles or gets replaced by something different. Will our data be around for others to see what life was like for us, or will our great-great grandchildren look back on the early 21st century with a whole lot of questions because the files got corrupted, the servers stopped working, the software was no longer supported, the media storage devices couldn’t be plugged in to anything, and there were no backups, no hard copies, no instructions left when the company got acquired or the standards changed or the lights went out.
Does that sound like a crazy thing to worry about, to you? If so, I really hope you’re right. Just don’t think about other technologies that have gone the way of the dinosaur, and how much information got lost when the transition left them behind. We can still buy floppy disk drives on Ebay, Atari 2600 consoles on OfferUp, and a whole host of dongles to convert from this type of connector to that, and there’s always the Wayback Machine to remind us about all those ugly GeoCities sites we made.
But even in the near term, data preservation can sometimes be a struggle. My father passed away a few years ago, and I was delighted to find some old cell phones in a recent move that I was sure contained voicemails from him. I had to dig deep to find some old-style iPhone connectors, and when I did I listened to each recording a few times. That was really nice for me.
But zoom ahead another 30 years, or 40, or 100. What are the odds I’ll be able to listen to those recordings? It’s the long term horizon that’s the concern. It’s more than a little ironic to think that our generation – the selfie generation, the one that posts pictures of every meal for the whole world to see, but prints out none of it – could be the one that future generations know the least about. If they know about us, it’s because people out there – the librarians and archivists – will have come up with redundant solutions to preserve and conserve.
So that’s what I have! 4 “Big Challenges” that I believe we face as data workers. Which of those four is the biggest? I don’t really know, and I don’t have a good way to rank them. Maybe you’re aware of a 5th or 6th that’s even bigger. What do you think?
Thanks for reading,
Ben
Definitely agree with many of your points above and have written my own views on them as well.
Code of Ethics is an interesting one, as it requires a shift in other practices as well. Currently, many jobs treat data workers as technicians rather than as practice professionals. When we look at other professions (that also have a Code of Ethics) we recognize a few things:
1. The professional has a body of knowledge through learning and experience that is not easily replicated elsewhere and has a profound influence on someone else’s life, liberty, or well-being. Doctors directly influence health. Lawyers can handle the gamut of life, liberty, and well-being. We as data workers can also have the same effects, often without realizing it. One scary part, we are not always given all the information or included in the discussions around use. In some respects, we may be operating blind-folded.
2. Practice professions often have an identified name and create bodies to regulate and support them. Taking an oath without backing of a greater organization means you risk facing an ethical dilemma to which there is no guidance, no support, and no decision-making tools.
3. Training programs take into account the role a practice professional plays and provides decision making tools. I’m not sure how many data workers get inoculated to this, as it usually falls more in the liberal arts (social sciences) area.
We’re definitely evolving into a practice profession, as seen by discussions, education paths, etc. We need a name and a representing body to have any weight and to make the types of changes I think we want.
Item 1 – Tool based limits and silos
Ben jones appears to be purposely twisting words here and change it to availability of community when the challenge is exactly how the phrase reads – Tool based limits. Tableau is hard to learn and master. Instead of calling this out, he is twisting that to say community and points out that Tableau has a large vibrant community. Who cares! If the tool is easy to use, I dont necessarily need a vibrant community. I only use the tool to process data – I am not a data groupie! Get it?
Yep I get it. Not the first time I’ve been called a company shill. I must admit though, it did catch me off guard this time, since it came in a post about how tools are limiting us and how we’re finding ourselves in tool-based community silos with walls we should break down. But best of luck to you. It sounds like you’re just trying to get a job done, and finding it a frustrating experience. I’ve been there. Let me know if you ever need help with anything. I promise I won’t make you wear a data rockstar t-shirt.
Challenge 4 is the most interesting of the group, especially insights. Learning from our mistakes/successes are rarely considered even though the data is available, albeit frequently hidden, to answer this.
An added connected challenge is quantifying whether the data/vis is useful. We assume that it is, especially if there is a high gee whiz factor. But I don’t see examples of a work flow that helps me frame what I am look for and whether I found it. And more importantly, how does it make my data world richer and make me a better data explorer? Ironic that we are deep in data yet the data we need, metrics on usefulness, is missing.