When I got started in IT during the dotcom boom I immediately faced a problem – I couldn’t get a job without experience, and I couldn’t get experience without a job.
If you’re entering the rapidly growing data science industry today, you face the same problem.
Employers want to see your data science portfolio, but you don’t have one because you don’t have experience.
Thankfully this chicken-and-egg problem is easy to overcome.
Yes, you can build a data science portfolio without job experience.
5 Steps for Building Your Data Science Portfolio
So now you know you can gain experience without getting a job, and you can show that experience by building a data science portfolio.
But how do you do it in such a way that it can actually help you land a job?
I’ll walk you through it right now.
Step 1: Pick a Project
The first thing you need to do is pick a project to use for your portfolio. Ultimately you want to have a few under your belt, but pick one to get started.
Now don’t do something trivial like take the Iris dataset and perform a simple analysis.
Start with a “dirty” dataset that you need to do some work on to make it ready for analysis. This is more reflective of the real-world, and will show employers you can do more than just work with a pristine dataset.
And speaking of employers, have your project solve an actual business problem. This will make it more relatable and demonstrate that you are thinking in terms of their needs rather than your own.
If you have trouble coming up with project ideas, here are a few to get the creative juices flowing. Each of these is a prediction problem:
- Future housing prices in a neighborhood
- Sales for an e-commerce website
- The probability that a driver will file an insurance claim in the next year
- If a tumor is cancerous or not
- If a piece of equipment meets quality standards or not
Step 2: Create an Account on GitHub
Keeping all of your work in a source control system is super important for two reasons:
- If anything happens to your computer it’s backed up
- Others can access the work you’ve done, and build on it if the like
The best website I know for this is GitHub.
One of the awesome things about GitHub is that you can upload a Jupyter Notebook you’ve run and show all of the results in a web browser. This makes it super easy for employers to view your entire analysis.
Set up your project on Github with these instructions:
- Go to GitHub.com and create a free account
- Create a new public repository and name it wisely.
- Clone the repository onto your local computer using a git client. I recommend GitKraken. It runs on all operating systems, looks awesome, and is super easy to use.
- Start on step 3 below…
Step 3: Build the Project
This is where you roll up your sleeves and get to work writing some code.
As Python is the most popular language for machine learning, I suggest using Python for your project.
With your project idea in hand and knowing the business problem you want to solve, the first thing you need to is find a dataset.
Next fire up a Jupyter Notebook and perform your analysis. Steps you want to ensure you cover are:
- Import the data
- Wrangle the data – cleaning it, normalizing it, and whatever else you need to do to get it ready for analysis
- Perform feature engineering – this is a fancy term for deciding which features are best to use, and removing/adding features as necessary
- Perform the analysis – creating one machine learning model is good but trying multiple models with different parameters shows you’re trying to find the best solution. Use a technique like GridSearchCV to add a little panache to your project.
- Write up recommendations base on your results
Along with all the steps above I recommend adding formatting to your notebook.
Here’s an example analysis of the Titanic dataset I did for the Machine Learning with Big Data class I teach. Note the headings and additional notes liberally spread throughout the project.
Step 4: Upload Your Work to GitHub
Once you have you project coded up and you double check everything (spelling and grammar errors, that your analysis makes sense), check it into GitHub.
Once it’s checked in, go to your GitHub repository and open the notebook. Be sure it looks like you expect it to, and that nothing got messed up once it got into GitHub.
If anything did get messed up, fix it locally and then check those changes in.
Step 5: Link to Your Project on Your Social Media Accounts
Now that you’ve got your portfolio project up it’s time to promote the hell out of it!
At a minimum add a link to your project to the summary section of your LinkedIn account. My friend Brandon Rose does this at the bottom of his LinkedIn summary:
If you have a blog, write up a summary of the project and add link to your project.
Post a link to the project on Twitter.
Tell all your friends about it on Facebook.
Basically link to it everywhere you live online so others can find it.
Here’s What You Need To Do Next
Now you have a decision to make. Are you going to take action and test this quality advice right now or will you procrastinate? The ball is in your court my friend.
If you’re an action taker – and I’m sure you are as you’re reading this post – here’s what I want you to do next:
- I want you to brainstorm a few projects you’d be interested in doing.
- Then, if you plan to build your data science portfolio using the steps above leave a comment and tell me about it.
- Finally, if you haven’t subscribed to my data science newsletter scroll down a little bit, fill out the form, and subscribe today.