Data science lives at the intersection of a variety of different fields including programming, mathematics, statistics, machine learning, computer science, software development, traditional research and domain knowledge.
The work is problem-focused and goal-oriented. It requires testing different approaches and discovering what works. So, it’s science, yes, but it also requires a little bit of art.
What’s exciting about this field is there is more data available every day, and companies and organizations are trying to find ways to harness this information. Why? Because the possible insights are endless.
New data can help companies better understand their audience or give them a unique perspective on trends so they can be the first to design an effective solution to an emerging problem.
Most importantly, companies understand that access to data isn’t the only ingredient for success. They need someone who can extract insights and analyze the large (and growing) information databases — this is the role of the data scientists.
So what are data scientists expected to do in their jobs? Here are the four key job tasks.
Job Task #1: Cleaning Data
Data in the real world is messy. It is coming in as pictures and text. The data is inconsistent and sometimes comes in the wrong file formats. It’s a jumble.
A big part of a data scientist’s job is to clean, organize and process the data so it’s ready for analysis. You have to be the person who knows how to fix all the problems and make the data consistent.
This work might involve taking data from multiple sources and combining it effectively. For example, you might have data about your customers, organization and industry all in different databases. However, you know that if you combine them, the organization could learn something new and capture a competitive advantage.
Data scientists will ask, “how do I take all this different data, clean it up and make it useful?”
Job Task #2: Noticing Patterns and Visualizing Insights
The best data scientists are able to polish the data and shape it into something useful. They are excellent at extracting the wheat from the shaft because there is a lot of noise in the data.
Good data scientists have an eye for patterns, especially subtle ones. They can execute a variety of analysis techniques including basic statistics to more complex models. They might be comfortable using powerful machine learning methods, which can find even more subtle connections in the data.
Finally, they can visualize the data, and they know the wrong technique can lead to misunderstandings and confusion.
Job Task #3: Communicating Discoveries
What will set a data scientist apart from a technical person who might be good at analysis is their ability to communicate.
Companies have found themselves hard-pressed to find people who are excellent in both areas.
Communication might mean giving a presentation or report so decision-makers can act on the new insights. Maybe you created a tool that helps predict something about customer behavior, and you want to help other people in the organization use it.
Job #4: Supporting Automation Efforts
Data scientists are often tasked with automating a variety of business activities.
There could be an automation project that makes it possible for incoming data to be instantly processed and cleaned. Automation can also help with the data exploration work, which is a preliminary step. In fact, automation can support every step of the data science lifecycle.
It could also be simplifying some sort of repetitive task that colleagues are doing in spreadsheets. If it is time intensive, there is a big incentive to simplify the work and improve workflows.
What About Other Data Professionals?
As you can see, data scientists clearly wear many hats. And, at this point, you might be thinking, “when do the data engineers and data analysts start helping?”
These roles are distinct, yes, but for many companies the boundary between them gets extremely fuzzy. As a result, data scientists often wind up doing a little bit of the analyst role and engineer role as a part of their job.
So, starting out in the field, it’ll be to your advantage to have a variety of skills–even those more commonly attributed to other data roles. Additionally, your domain knowledge will serve you well depending on the industry.
Checklist of Skills You’ll Need Getting Started in a Data Science Career
An important note: in the data field, it’s critical to learn by doing. Theoretical knowledge isn’t as useful in this practical field. You’re going to need to practice, practice, practice.
The following checklist outlines the basic skills, and some highlight some advanced skills to consider building for the future to help you stand out as a data professional.
- Math: calculus, statistics, linear algebra
(Advanced skill: bayesian methods, sampling techniques)
- Programming: Python, R
(Advanced Skills: C++, Java, Git, CI Tools)
- Data Manipulation: databases, cleaning data, combining data sets, handling semi-structured data, handling missing data
- Machine Learning: theory, data preparation, Scikit-learn
(Advanced Skills: natural language process, time series, neural networks)
- Communication: presentation skills, knowing audiences, data visualization skills (Advanced Skills: Making technical concepts understandable for non-technical professionals)
- BONUS SKILL | Big Data: working with large data sets, unstructured data, distributed computing, cloud computing
Build and Refine These Skills at TDI
The Data Science Fellowship from The Data Incubator is an immersive, hands-on experience for those with a passion for data.
There are two versions of the program: a full-time, 8-week program, and a part-time, 20-week program. Both programs cover the same information, are taught by the same instructors and provide the same hands-on data training.