By Brandon Cosley, FastDataScience.AI
Data scientists are in demand, there are no two ways about it. The jobs pay well, there are plenty of openings available, and the industry only appears to be growing in this post-pandemic digital world. It should come as no surprise then that data science students are also a growing sector of the world labor force. But learning data science is not easy. In fact, it is hard, and it is hard for several good reasons:
1. Data science as a profession blends a lot of different sub-specialties that are professions in their own right, such as data engineering, programming, statistics, and data visualization.
2. The industry and associated tools and technologies are evolving rapidly, making it difficult to know where to focus one’s studies.
3. There is a gap between the data science taught in educational settings (universities, digital tutorials) and the data science used in enterprises.
4. With the vast breadth of knowledge required it is easy to lose confidence in one’s ability to effectively communicate the value of one’s education to a prospective employer.
I remember my own experience trying to go from a data-savvy academic researcher to an industry data science professional. I exposed myself to all the tutorials, blogs, and MOOC’s that I could. I immersed myself in industry news and trends. I filled my bucket to the brim and found that the more I learned the more I realized I didn’t know. I was stressed, lacked confidence in what skills I had, and felt like an imposter going on data science interviews hoping I wasn’t going to be met with a “gotcha” because I didn’t spend enough time on loss functions.
I overwhelmed myself with data science education with the hope that my breadth of exposure would lead me to my purpose, and a better paycheck. What I didn’t realize at the time was that I had put the cart before the horse. I was so eager to learn that I spent all my time learning lots of “things” without ever stopping to ask myself; How do all of these “things” come together to solve real problems?
Allow me to let you in on an obvious secret, most businesses don’t care about data science “things.” Most businesses only care about whether those things can solve business problems. So herein lies the rub, trying to learn all the tools of data science so your resume can be filled with an ever-expanding list of “things” (Python, R, regression, random forest, Naïve Bayes, Markov Chains, support vector machines, k-means clustering, XGBoost, convolutional neural nets, natural language processing, blah, blah, blah) is futile.
These “things” will not lead you to your purpose because your purpose is only defined by where you feel valued. Where you will feel valued is where you allow your ever evolving knowledge of data science to be applied to solve problems. Being able to communicate how you leveraged some of the tools of data science to solve a problem, will take you much further in business than simply listing all the algorithms you have been exposed to in one class or another.
So how should I approach learning data science?
In short, find a purpose first. What do you care about? Where do your passions lie? What problems do you want to solve? Once you have a list, pick something, and consider how your data science knowledge might be applied to solve a problem related to that interest.
The Benefits of Data Science with a Purpose
By finding your purpose first, you will approach your data science education with context and the tools you will seek to learn will feel less overwhelming because there will necessarily be fewer of them that make any sense to apply.
Knowledge, passion, and understanding of your problem will also open your creativity. Creative problem solving is seeing how our understanding of two or more disparate fields can be combined in novel ways. If we are steeped in only learning data science in the context of our “canned” data sets and dispassionately assigned problems we are no longer able to bridge our depth of knowledge from multiple fields.
By finding your purpose first, you will quickly learn that there are many different data science solutions for solving the same problem. In other words, it is rare that there is right and wrong in data science and far more common that business problems can be solved in a myriad of ways. Are some solutions better than others? Sure. But this doesn’t mean that those that are less optimal are wrong, rather they are just not as good. With enough money and time, there is always a “better” solution so best not to get too caught up with that spiral. Instead, focus on how the knowledge you have can bring more value than what was there before or adds to existing solutions by revealing new insights not apparent in others.
By finding your purpose first, you will work through problems that are often not taught in most data science courses, but they are problems that enterprise data scientists face every day. Take for example the simple problem of finding the right data. Most data science courses don’t teach you the value of data discovery but in the enterprise data scientists are often charged with discovering and blending with new data sets to further realize the value of both the data collected and the data scientist hired to value it. Learning data science with a purpose first will force you to look for ways to acquire the data most relevant to your problem, it will require you to access, wrangle, and engineer that data so that it is amenable for training with machine learning models.
Finally, by finding your purpose first, you will know how to communicate the value of the solutions you build.
What was my purpose and how did it change my education?
My purpose was social justice. I wanted to use the tools and skills of data science to inform generating insights that expose injustice, that provide solutions for positive social change, and that help us to realize the implications of human bias.
In my first project, I wanted to help identify pockets of vehicle crimes for third shift workers to support safer parking decisions. I had to locate local public police reporting data and blend it with other data sources such as Census data. Using what data science knowledge I had, I new I could build a predictive model to predict the likelihood that a car would experience a vehicle crime (e.g. theft, vandalism) based on features from the surrounding location. This project led me to learn basic data wrangling, how to derive some geospatial features, to test different classification models for accuracy such as random forest, logistic regression, and Naïve Bayes, basic visualization using Tableau Public, and how to set up a pipeline to refresh the dashboard every time the police data refreshed.
Were there other problems I could have gone after? Of course. Were there other tools I could have used to solve this specific problem? Most definitely. Did I come up with the best solution, or even the only solution on the market? Not a chance but my solution was better than what was there, which was nothing.
Not only did I learn the specific tools mentioned above, but I also gained more intuition on the process of data science. I was able to more clearly articulate why I would want to use specific classification models with specific data types over others. And most importantly, I was able to speak passionately about how these tools allowed me to make informed decisions by combining hundreds of data points.
Now, when faced with new purpose and asked whether there is a data science solution to overcoming problems associated with that purpose, I no longer feel a lack of confidence in what I don’t know. I use that purpose to apply what I do know, explain my approach, and identify something new to learn with the confidence of knowing that I can.
Bio: Brandon Cosley has over 15 years of experience in data science. He received his PhD in research psychology with an emphasis on quantitative methods and has worked as director of data science and AI for a major healthcare company, cofounded multiple data science startups, and is an active thought leader in the space. In 2021 Brandon launched FastDataScience.AI, a growing resource to support data science education. Join the conversation in a growing Facebook community at Think Data Science.
Original. Reposted with permission.