Haven’t we debunked all the data science myths yet?
The answer is no; the myths just keep on coming. On the Data Science Mixer podcast, I always ask our guests the same “Alternative Hypothesis” question: “What’s one thing that people think is true about data science or about being a data scientist that you have found to be incorrect?”
We’ve shared two roundups of guests’ favorite myths already, but there are always new responses to this question! Let’s check out what our latest three guests chose to highlight from their extensive and diverse experiences in data science.
I still stumble upon this problem all the time: The fact that people tend to believe that it is the data that matters. And it is not the data that matters. It's like the founder of the TED conferences, Richard Saul Wurman, likes to say: "People care a lot about big data. But what we really need is not big data, it’s big understanding." So that is different. Data sometimes tends to become a goal on its own. In philosophy, people talk about instrumental goals and final or ultimate goals. Data and the tools I use to manage, to handle, to explore, to analyze, to visualize the data — that's not an ultimate goal. That's an instrumental goal. It's an instrument. It's machinery. It's a device that we use, a tool that we use to achieve something higher, clear communication or understanding. So what matters in data is not the data; it's to find an interpretation of the data that is rigorous, that corresponds to reality, and at the same time, that may be useful.
Everybody knows what the term data science means.
It’s a myth that everybody knows what the term data science means, and that it's actually the same as machine learning and AI. When I'm talking to technocrats and policymakers, I always have four slides, in the beginning, explaining all of those things, so that by the time I’m discussing the rest of the stuff, they kind of understand what it is.
I encourage people to ask [what data science is]. And then I'll give a definition that I believe is closer to what I do so that they can understand it in our interactions. When I finished my Ph.D., I talked to old mentors, and one of them was my data science manager [from my internship] at Newton. I said, "What advice would you give me? I'm planning to go back to South Africa. I want to do data science. I want to build my own teams." He said, "Language. When you're talking to people who are not data scientists, the definitions must be the same. You can't do a lot of things for months or years and then find out that you actually did not have a common understanding, and all of that work was a waste."
It’s the concept that the models are just there for you to plug and play, and you just call three lines of code, and here's the result. It's almost like a recipe, where the actual quality depends on the ingredients.
Obviously your recipes are important, but there's a concept that data science is all about the models, or at least that the models are already perfect and they're ready. But it's really about the whole process.