The benefits of understanding data science are clear:
- An UNDERSTANDING of data science makes you more valuable to employers (as more jobs are automated data science is an important skill to have)
- An UNDERSTANDING of data science allows you to appear like an expert without being one (many use the jargon but don’t know what it all means)
- An UNDERSTANDING of data science will make you the leader on your team (you’ll be the one who bests understands this complicated topic)
The question is “How can I start learning data science TODAY without going back to school?”
The Full Frontal Assault
There are three skills every data scientist needs:
- Software engineering
- Stats & math
- Data communication
The way many courses are taught is to go deep one one of these at a time. I call that “the full frontal assault”.
Much of the training you encounter uses the full-frontal attack, and is less than optimal.
Is there a better way to gain three skillsets at once?
The Flanking Technique – The Easiest Way to Understand Data Science
Data science is a complicated topic. A recent review of blogs in the first three pages of Google results for “learning data science” uncovered more than 50 unique topics. Many of them are college majors.
While my background is in software development, when I got into data science I had to learn a lot:
- How to build distributed systems
- How to select which machine learning algorithms to test, and then how to use them
- How to implement machine learning to analyze data
- How to report the insights I discovered to technical and non-technical people alike
Each of these is a full-time job and it’s own discipline. I had one job, and I needed to do all the things I just listed.
Over the years I refined a technique for rapidly learning new skills. I call it “The Flanking Technique”.
In military tactics, a flanking maneuver is a movement where you surrounded the enemy on multiple sides. So instead of simply attacking the head on, you surround and overwhelm them.
What does this have to do with learning data science?!
The same idea can be applied.
Attacking the topic of data science head on is a losing proposition. Why? Because you need to gain three skillsets, not just one.
So instead of doing things one-at-a-time, use “The Flanking Technique” to gain all three at once.
The Flanking Technique is much more powerful and an easier way to win. Instead of approaching data science from only one side you surround it and tackle it from all sides.
That said, you may be wondering , “How do I use the Flanking Technique?”
The Simple Step-By-Step Process To Using “The Flanking Technique” To Learn Data Science
There are three simple steps to using “The Flanking Technique”:
Step 1: Figure out which entry point into data science you want to use.
Step 2: Understand what you need to fill in the gaps of your knowledge.
Step 3: Work on a project to gain all three skills.
Let’s go over each step in detail.
Step 1: Figure Out Your Entry Point
Each of the three data science skillsets is an entry point:
- Stats and Math
- Software Engineering
- Data Communication
If you have a degree in any of three areas that’s your entry point.
If you are brand new to data science though, I suggest starting with software engineering, specifically learning Python.
The reason is that everything you do as a data scientist is going to require programming skills. Sure, you could start digging into statistics, but to implement a machine learning model, more than likely you’re going to use Python.
So if you’re brand new, start by learning Python.
With that said, what do you need to know for each of these three areas? Let’s take a look.
Step 2: Understand Each Path So You Can Fill In The Gaps
There are a number of topics you need to be familiar with for each skill set. Look at what’s required for each so you can determine where the gaps in your knowledge are and start filling them in.
To train and use machine learning models, and work on the systems that use them, you need software engineering skills.
Specifically you’ll need to be able to do the following:
- Build systems that can gather, process and store data.
- Know 1 (or more) programming languages.
- Apply your knowledge and skills to (large) datasets.
- Be familiar with distributed computing and the technologies involved.
To start out I suggest the following path:
- Start learning Python
- Read up on Amazon Web Services, specifically EC2 and S3
- Familiarize yourself with MySQL, a relational database
- Check out using Apache Spark for processing data
Stats & Math
All of the machine learning models you’ll implement are math under the hood. While you may not be writing your own algorithms, you definitely need to understand what’s going on. A lot of what’s going on is statistics.
For stats & math you’ll need the following skills:
- Understand statistics, math, and modeling rules.
- Understand existing algorithms and create new ones.
- Be able to design and measure experiments.
Depending on how deep into machine learning you want to go, you may not need to be a math wizard. With that said, I suggest this path:
- Use Jupyter Notebook
- Learn about Pandas, the Python library (not the fuzzy cute bear)
- Become familiar with scikit-learn, the defacto machine learning library for Python
- If you want to go big on your data, check out Spark MLlib.
The final piece of the puzzle is being able to communicate your results to others. You’ll need to be able to:
- Communicate the solution verbally.
- Communicate the solution with charty goodness (lot’s of charts!).
- Relate the solution to the business problem you’re trying to solve.
- Recommend actions to be taken using the solution you’ve created.
If you aren’t familiar with presenting your solutions and suggestions in front of an audience I suggest starting with charty goodness.
The best open source Python tools I’ve found here are Jupyter Notebook, Pandas and Matplotlib. With this trifecta you can present via a web page or PDF report your findings, charts and all. Pandas especially makes it very easy to create charts with a single line of code.
After this, start boning up on your public speaking skills. Find a meet up you can present at or hold a brown-bag lunch for your team. Both of these venues are typically filled with very supportive people who want to see you succeed. Definitely start there.
Step 3: Work on a Project To Gain All Three Skillsets
Once you know your entry point and the skills you need to get, the next step is working on a project that will help you gain all three skills at once.
The best part of working on a project is that you can use it as part of your data science portfolio.
While this could literally be anything, Kaggle (a data science competition website) has a number of “getting started” projects that can help get your creative juices flowing:
- Predict survival on the Titanic (link)
- Predict sales prices using regression (link)
- Learn computer vision fundamentals with the famous MNIST data (link)
Now I have a question for you…
While this is a powerful technique, using the comprehensive case study approach in my premium training makes learning data science much easier.
But don’t let that hold you back…
For now, THIS WEEK, I want you to figure out ONE project you’d like to use the Flanking Technique on.
Just one thing.
Will you predict survival on the Titanic? Will you predict the number of product sales for a grocery store? Will you create a topic model for blog posts you’ve scraped off the Internet?
Reveal what you plan to do in the comments below.
And if you have any questions about “The Flanking Technique”, feel free to leave that in the comments as well.