What is data science consulting?

Capgemini

2022-04-25

Introduction to the series: “Data Science Consulting”.

Dominik Deja – Data Science Manager, Capgemini Invent

Every now and then someone asks me what my profession is. The typical reply “I work as a data science consultant” is usually too vague for people who never heard that “Data scientist is the sexiest job of the 21^st century”. Therefore, the goal for this short post series (of which this is the first post) is to show what data science consulting is and how it works. The ambition is to make it interesting for people wishing to understand better what data science is about and what data science consultants do as well as for people considering entering data science consulting path who would like to prepare better for the upcoming challenges.

In those series, data science consulting will be portrayed in accordance to how each author (as I invited my colleagues to also share their views) sees and understands it and authors will never dare to suggest that their point of view is the only one valid. I firmly believe that for this topic, there is no point in enforcing one right point of view. Quite contrary, I see that there is a lot of value in sharing one’s experience to allow others to build on that and grow faster themselves. I hope you will find it helpful.

Let’s start with the first part – data science – and leave, for the moment, consulting at the side. My definition is:

Data science is an interdisciplinary field encompassing inventing, developing, and applying scientific methods, processes, algorithms, and systems focused on using data to allow for automated, data-driven insights generation and decision making.

Having this definition, let’s put it to the test:

Alice working on 3GB spreadsheet tables, running pivots, index-matches, wrangling, munging, mapping, and “massaging” the data – is she doing data science? Since it is very manual and usually poorly reproducible or even non-reproducible work – most probably not
What about Bob who created an Excel based solver for optimizing production of spoons in the factory he is working for? Well, as he produced a tool enabling automated, data-driven decisions – yes, I would say that he is doing data science. Still simple and rough, as it is hard to properly maintain spreadsheet-based tools, but it is nevertheless data science
Cecilia is excited about providing beautiful-looking dashboards to her senior director. Is her work data science? Well, it depends on whom you ask. As for me – does it allow for automated, data-driven decisions and insights? To a certain extent yes, so I have no problem with calling it data science, even though some more orthodox data scientists might raise their brows and snort a little
Danish doing linear regression model on 15 examples? Data science for sure – even though the dataset can be small, he implemented an algorithm allowing for data-driven decisions
Estera is inventing new algorithms for speech recognition – definitely data science
Frank used statistical techniques for outlier analysis to detect potential fraudster while running one-time financial audit? Data science (even though Frank might not call himself a data scientist!)

So again, but with less jargon. Data science is when someone is trying to invent or implementing algorithms, which based on available data, will learn how to, and eventually will, do somebody else’s work.

Data science is heavily business focused and very goal oriented. It does not pay tributes to the lineages and provenances of the techniques that it uses – no matter whether a given concept originated from statistics, linguistics, topology, or specific domain knowledge – if it is useful and allows for automated, data-driven decision making or insights generation, then data scientists will happily jump onto the impact train and will use it.

What about all other terms used in business and media? One can find a multitude of terms being used: MLOps, artificial intelligence, deep learning, data science, advanced analytics, data analytics, machine learning, big data, data mining, exploratory data analysis, statistical learning. While each of those terms convey certain, distinct meaning, in business world they are often being used interchangeably, with frequency varied not necessarily by their meanings but by fads and individual/corporate preferences.

Thanks to Google Trends (see the graph above), one can easily check that “data mining” is a term which used to be popular but is not used anymore. “Artificial intelligence” used to be popular, then got abandoned in late 00 and recently became popular again. For a short moment, we got excited with “big data”, but then it became boring, so now we’re exploring “data science”.

Still, some terms are more than buzzwords, so here’s my perspective on how to differentiate those terms:

MLOps (Machine Learning Operations) is a set of practices for designing, implementing, deploying, and maintaining machine learning algorithms in production. It exists, because no matter how cool machine learning models can be, eventually, if they’re not deployed properly, whole projects will fail. And since deployment was usually done by DevOps and not Data Scientists, who preferred to focus on research and development, MLOps emerged to patch this gap
Artificial Intelligence (AI) aims at developing machines which will be on par (or stronger) with human intelligence, and all what comes as a side product and can be leveraged by business is proudly called AI. Scientists, especially the old-school ones, love to discuss for hours whether it is even possible to define what intelligence itself truly is. Recently, with visible progress in the development of autonomous cars, natural language processing applications, and computers playing video games, the term gained more popularity
Machine Learning (ML) is a field studying algorithms which learn how to execute specific tasks, based on data they are provided with.
Deep Learning (DL) part of ML, which focuses on artificial neural networks
Advanced Analytics (AA), Data Analytics (DA), Big Data are terms which are business-world synonyms for Data Science (with big data usually, but not always, referring to working on large datasets)
Data Mining is nowadays rather obsolete term describing the process of insights extraction for data
Exploratory Data Analysis (EDA) is an analysis of data aimed at getting better understanding of data’s characteristics
Statistical Learning is a term used by those who would like to analyze ML from a statistical theory point of view. Also, it a title of one of my favorite books on ML, The Elements of Statistical Learning

How Data Science relates to consulting? Overall, consulting can be defined as the practice of helping organizations to improve their performance. Since we already described data science as a constant search for processes to automate with machines capable of learning from data and then, supporting humans in insights generation and decision making, consulting will have a natural appeal towards leveraging data science. Therefore, we finally can define:

Data Science Consulting is a practice of helping organizations to improve their performance by applying scientific methods, processes, algorithms, and systems focused on using data to allow for automated, data-driven insights generation and decision making.

Again, it is quite abstract, so we will explain it with more down-to-earth examples, but since I already exceeded what is considered as an appropriate length of a blog post, let’s get back to it in the next post, where we’ll go more into the business side for data science consulting, and introduce generalist role.