From a naturalistic point of view, and in much the same way anthropologists would observe a tribe in the Amazon rainforest, let’s look at what people with the job title ‘data scientist’ actually do. Working with a team of data scientists, I have some first-hand experience of what it’s like to be a member of this particular tribe.
We use a combination of methods from a variety of different fields – including but not limited to statistics, machine learning, applied mathematics and databases. Generally, we are a very pragmatic bunch.
Data scientists realise there are many ways to solve a business problem. There are also many ways to give insight into a question that has been asked. As the statistician George E P Box once wrote: “All models are wrong but some are useful.”
“Knowing statistics and how to code are essential to being a data scientist, but a data scientist needs to be a good storyteller”
Data science is the pursuit of translating a question into something that could be answered using data, then applying a variety of techniques to see what happens. We are at the frontier, exploring what value we can generate from data for businesses and people.
For those of you in the know, this definition of data science may not sound terribly different from what a statistician, mathematician or computer scientist does. It’s tempting then to think that data science is not a field in its own right. But let’s consider this influential definition by Josh Wills, a data scientist at Cloudera: ‘A data scientist is someone who is better at statistics than any software engineer and better at software engineering than any statistician’.
Josh’s definition makes it clear that your average statistician is not a data scientist and your average software engineer is not a data scientist. I’ve hired many data scientists and I’ve gone through hundreds of CVs from people aspiring to become data scientists. Knowing statistics and how to code are essential to being a data scientist, but even that is not the whole picture. A data scientist needs to be a good storyteller.
Storytelling with data often means creating beautiful visualisations that capture the main messages to be conveyed by the data. Sometimes it’s not about the visualisation at all but about finding the right metaphor that will make an idea understandable to a wider audience of stakeholders. As a result, data scientists need a very niche set of expertise to succeed.
It makes sense to speak of a field named data science. It may not refer to an entirely novel set of activities but it does allude to a relatively consistent set of activities that it is useful to name. Fascinatingly, from a sociological point of view, it is because we give it a name that the field exists.
Note: this post originally appeared in the print edition of Enterprise 360.