(Featured image: iStock/stefanamer)
Do you need a dedicated “data curator”, or is that a task that should be shared by existing roles within the business and IT?
This question was prompted by an article I read at CIO entitled “6 data analytics trends that will dominate 2018“, written by Thor Olavsrud and published back in March of this year. I first stumbled on this article last week, and the 3rd trend caught my eye: “Rise of the data curator?” It was worded as a question, and I didn’t have a firm answer, so I tossed it out there to the Twitterverse:
1st we have the "data translator" & now the "data curator" as go-between roles. What say you – a critical piece to a BI team, or needless specialization when current roles should just do these tasks well? Pretty sure I know what @adamemccann will say. https://t.co/RmhFDJETez pic.twitter.com/lo76EfdKwm
— Ben Jones (@DataRemixed) November 20, 2018
It was an honest question, but you can see by my wording that I was a little skeptical at first. Back in February, a separate conversation on twitter involving Adam, Ken Flerlage, Josh Jackson and me centered on whether there’s a need for a dedicated “analytics translator” that was triggered by an article in the Harvard Business Review in that same month. Sentiments were mixed – should a company ask / train analysts and data scientists to learn the needs of the business, or should they hire specialists who act as a go-between? The discussion about data curation poses a similar question, only farther upstream in the process that converts data into a decision.
What is a Data Curator?
Before we go any further, let’s stop to define what data curation is. Wikipedia provides the following:
Data curation is the organization and integration of data collected from various sources. It involves annotation, publication and presentation of the data such that the value of the data is maintained over time, and the data remains available for reuse and preservation.
In the CIO article, the data curator is described as someone who…
…sits between data consumers (analysts and data scientists who use tools like Tableau and Python to answer important questions with data) and data engineers (the people who move and transform data between systems using scripting languages, Spark, Hive, and MapReduce). To be successful, data curators must understand the meaning of the data as well as the technologies that are applied to the data.
Sort of like this – my attempt at a (simplistic) diagram:
Drawing from my own experience
I haven’t had the chance to lead a business analytics team since my days at Medtronic half a decade ago, but while I was there, a critical component of my team’s success was having very close relationships with the IT specialists who created the data sets themselves. Often an analytics project would start with a question to IT about whether a particular type of data was available or not, or how close we could get with existing data sources to approximate an answer to a relevant business question.
Often, there’s a need to “push” information to analysts about new or interesting data sets instead of waiting for such a “pull” to come based on a business need. Here at Tableau, our Marketing department holds periodic training sessions to help people within the department understand what high-value data sources are available, how to access them, what exactly they contain, and examples of insights gleaned from them. These are highly attended and appreciated.
It occurred to me that an analogy exists in the world of open data in which I have been immersed over the past 5 years as marketing head of the Tableau Public platform. Curation is critical in that space, too. We love regularly updated repositories of data like Data is Plural by BuzzFeed data editor Jeremy Singer-Vine and Awesome Public DataSets on GitHub because they clue us in to fascinating and relevant (and sometimes quirky) data sets on a weekly basis. While I was overseeing the Tableau Public website, the Sample Data page was one of the highest generators of organic traffic to the site. If you think about it, part of the reason the Makeover Monday project is so wildly popular is because Andy Kriebel and Eva Murray work very hard each week to provide a steady stream of data sets for participants to use to practice and hone their skills. That’s a huge shortcut for thousands of people. Imagine the human-hours that are saved because participants don’t have to hunt for data sets themselves with which to practice.
So I believe there’s not doubt that data curation – finding, surfacing, annotating, even sometimes cleaning and blending fascinating data sets and serving them up for broad consumption – is a critical task for private as well as public data discovery. But the question remains – does it warrant a dedicated role?
Actual data curators weigh in
It was really interesting to hear responses from actual data curators on the thread I started with that original question. Kelly Gilbert, Wendy Brotherton, and Hayley (who owns the OG @datacurator handle) all replied that their current role basically meets the job description of a data curator.
Kelly had this to say about it:
This is really my title at this stage of my career. I spend most of my time wrangling and combining data and QA’ing with business owners to create “authoritative” datasources for the enterprise. #dataninja
— Wendy M. Brotherton (@100datascience) November 21, 2018
She also related that she actually works with an entire team of data curators. The reason for such a team? According to Kelly, “We want LOB analysts to be able to focus on generating insights rather than assembling, maintaining, and finding data.”
Adam McCann also weighed in on “Team Dedicated Role”:
I actually like the data curator concept way more than a data translator. I was just discussing this with some engineers. Do u centralize data engineers or embed them w/ analytics. I think centralization is better BUT w/ a curator as intermediary who treats analytics as customer
— Adam E McCann (@adamemccann) November 20, 2018
The other side of the debate “Team Just Part of the Job” – was represented by Jim VanSisteen and Jason Forrest, among others:
I can see it as a niche role in the short term as companies work to raise the overall data literacy across their org. It's hard to build competency in everything at once. But I put it more in the "it's just part of the job" category.
— Jim Van Sistine (@jimvansistine) November 20, 2018
Isn’t that just a “BI” or Analyst role? Basically someone needs to understand what the data means, but I wouldn’t see that as new or even curatorial
— JasonForrest (@Jasonforrestftw) November 20, 2018
If you follow the entire thread, there are a number of interesting opinions – some speak about how this role was transformative for their business, others ask where to find one so they can make a hire, and others raise concerns about how this role could actually pose problems.
Daniel Zvinca raised a particularly interesting challenge to the role by expressing his preference to be as close to the raw data as possible when conducting analysis:
Just trying to understand the role, not sure if is needed or not. I can tell you this. As a direct responsible for the analysis output, I would very much prefer to have raw data than, maybe, adjusted data.
— Daniel Zvinca (@danz_68) November 21, 2018
Polling a handful of folks on social media
Seeking to wrap up the conversation, I ran an informal and not-at-all-scientific social media poll to find out what people think about this role from a more quantitative perspective. Here are the results – it seems like those in my social network who took time to respond mostly don’t have this role on their team, but they think it would be a helpful thing:
Do you have a "data curator" where you work? Someone who sits between data engineers and analysts, whose full-time job is to source, organize & accelerate high value datasets?
— Ben Jones (@DataRemixed) November 21, 2018
Like many things, whether or not a company decides to hire dedicated data curators might depending on a number of factors – how large the teams are, how complicated the data sets are, how critical a role data plays in the business, how well-versed existing team members are and how mature analytics processes are. And it might not be a yes or no answer – perhaps there’s a need for a dedicated curator or curators at certain stages of the maturity model, but not at others. It’ll be interesting to see what the future holds for this role, and whether or not we end up seeing the “The Rise of the Data Curator”.
What’s fairly certain to me after the conversation is that the tasks associated with this role need to be done well by someone. Thanks to all who chimed in and gave me a more complete perspective.