Why automation won’t replace data scientists yet

A recent report from Gartner predicts that 40% of data science tasks will become automated by 2020. Since the data science skills gap has been a concerning topic over the past few years, this news comes as a relief to some.

It was Gartner who said, back in 2012, that by 2020 there will be a shortage of 100,000 data scientists. The new report has prompted some to wonder what exactly is the future of jobs in data science and what will the field look like in coming years. To understand where data science is heading, it’s important to understand first what parts of data science can and will be automated and also understand the tasks that won’t be automated in the coming years.

Data science tasks that will be automated

According to the report, certain tasks are set to automate in the coming years: «With data science continuing to emerge as a powerful differentiator across industries, almost every data and analytics software platform vendor is now focused on making simplification a top goal through the automation of various tasks, such as data integration and model building.»

Simplification is key for data scientists. Automating mundane and repetitive tasks frees up employee time to work on more complex algorithms.

Data integration, for example, combines data from multiple sources and provides a unified look at the data as a whole. This process can and should be automated, in order to quickly pull together trusted data from multiple sources so that a skilled data scientist can analyse the results.

Model building involves collecting data, analysing and searching for patterns, and using data to make predictions. There are already tools that can automate model building – machines can collect data and point out patterns. Furthermore, these tools are becoming smarter, in that they are learning what type of patterns to detect.

Machine learning and automation are already impacting data integration and model building, helping data scientists complete jobs faster and more effectively. A machine does not have the error risk that humans have, so for tasks such as these, automation is vital.

Data science tasks that can’t be automated yet

Artificial intelligence (AI) can only go so far. Right now the technology is not quite there to automate the majority of data science tasks.

Data wrangling, for example, is the process of manually converting “raw” data into another form that is easily consumed. Data wrangling, also known as data munging, takes good judgement from a human – a concept AI tools don’t have quite yet.

Data interpretation and visualisation will not become automated in the sense that there will always be people that need to walk executives through the data for understanding. Only then can leaders make data-driven decisions for the good of the company.

Aspects of data visualisation may become automated in the future. Since more and more data is being produced at a rapid rate, the human workforce simply cannot keep up with the demand. Low-level pieces of data visualisation can be automated, but there will always be human intelligence factor needed to interpret and explain the data itself. Humans are still needed to write the various AI agents that can soon take over the mundane data science tasks as well.

The future of data science in the age of automation

So will data science tasks become automated? Certain aspects of low-level data science can and should be automated. Data collection and combining data takes valuable time away from trained experts, but there are many tools out there that help to automate all or parts of these tasks.

However, AI tools do not yet have human curiosity or the desire to create and validate experiments. That part of data science most likely will never be automated in our lifetime, simply because the technology has a long way to go.

Data scientists who are advancing in their field should not fear unemployment in the near future. Data scientists are typically programmers, mathematicians, and thought leaders all wrapped up in one, so no matter where the industry goes, it is unlikely someone with those qualifications will be on the job hunt for too long. More importantly, humans will still be needed to understand and collaborate with other humans for data science projects to be successful. This collaboration is key to transforming data into actionable data for decision-making.

Data science will scale thanks to automation tools, and data scientists will be able to work more efficiently and effectively. But human intelligence is still very much needed in this field, so though automation will and should help, it cannot completely take over.