Python Starter Skills Roadmap
This document is designed to guide users with no background in Python through several core topics that are commonly used in The Data Mine.
By the time this document is completed, you will have:
-
Skills to work with a dataset in Python.
-
A knowledge of basic analytics operations in Python.
-
Ability to create simple visualizations in Python.
-
Starting points for building knowledge in building predictive models.
How long will the roadmap take to complete?
-
Learning all the aspects of Python can be a lifelong process.
-
However, the topics below can be worked through in 10-12 hours.
-
This assumes no background in Python. Users who are more familiar will go much faster.
-
If this takes you longer than 10-12 hours to understand, that’s not bad! Everyone learns at their own pace and has their own familiarity with these concepts. It’s about taking time to understand rather than racing through to get started. |
This roadmap assumes that you have a Python environment available to you. If you are a member of The Data Mine, make sure you follow the Anvil setup roadmap and can launch a Jupyter Notebook session. You can also run Python locally if you have an environment setup. |
Step 1 - What is Python
Python is one of many programming languages that allow users to build powerful processes and applications in code.
Python can be used to build data automation pipelines, web applications, and many other things. The focus of this guide is to introduce how many teams in The Data Mine utilize Python to understand their data and build predictive models.
One of the most important skills that you can learn related to Python is how to research solutions yourself. Often, you’ll have a general idea of what you what you want to do but you’ll need to build that logic in code. An initial tip is just to add the keyword The other is to get familiar with some of the links below. |
-
-
PyPi is a very popular package repository for Python. It often has documentation related to how a package functions. It can also show beneficial information, such as when a package was last updated.
-
-
-
These pages can look intimidating, but they are worth it. Each page will look different for the function, but they almost always have some core pieces of documentation:
-
The
parameters
that a function can take as input. -
An example of the package or function being used.
-
Returns
sections that specific the output that you’ll get when running a function.
-
-
-
-
A message board that allows developers to share code questions and find solutions.
-
-
Python Learning Resources (Codingbat)
-
This site is one of many resources that can help users to learn basic Python.
-
Searching for
learning Python from scratch
can be a great place to get started.
-
What is a Python package? Python packages are ways to group together helpful sets of reusable code in Python. A real-world example would be add-ons for an app like Discord. By using these add-ons you can leverage different functionality in the app. This is the idea of packages. By installing and importing the packages, you can leverage different helpful functionality in Python. |
Step 2 - How to write code
Writing code, you’ll hear many people talk about their development environment. That’s the place where you’ll write, execute, and debug code. In this roadmap, your development environment will probably be a Jupyter Notebook session on Anvil.
Once you have a place to run your code, the next step is to start understanding how to write your code. Normally called code syntax.
Python can be very picky about certain things that will cause code errors. Work through the topics below to learn more about Python syntax, running the code in your Jupyter Notebook.
If you have any questions, ask your TA or email datamine-help@purdue.edu. We are happy to help! |
-
-
Most people know indentation as the thing that happens when you press
tab
on your keyboard. Python uses it to know which lines of code to group together when running.
-
-
-
Variables are one way that Python stores information to use later. The different types of variables are important to understand when you’re working with data.
-
-
-
Printing is how you display text in Python. It’s critical to use when fixing errors and understanding code functionality.
-
OK, you started understanding the different parts of Python. Now, give yourself a break. If you haven’t coded before, this can be a whole new way of thinking. Play around, try different things, and be sure to give yourself some time to think through what you’ve learned. |
Step 3 - Loops and Logic
In Step 2
we learned about variables and how Python requires code to be written. Now is the cool part. Learning about how to incorporate logic and looping to your code.
This is an important bit to understand. Make sure to give yourself time to work through the topics. Ask lots of questions about what you don’t understand.
-
-
These types of statements allow you to add layers to your code. Logical operators are the building blocks of many other Pythonic operations. They allow you to ask things like "is this condition true?"
-
A metaphor would be if you were working at an amusement park and had to make sure people are a certain height to go on a ride. Logic allows you to ask the code, is the person being measured over 4' tall. The logical operator will respond with something like
True
if they are orFalse
if they are not. These responses can be used in the control flow that we discuss below.
-
-
-
The next layer of complexity. These types of statements allow you to do things like set steps for the logical operators that you learned about. Or loop through values and act on items.
-
Keeping the theme park metaphor, control flows are the pieces that tell the code what to do.
if
the person is over 4' (True) allow them to ride.else
(False) say "sorry, you’re not allowed to ride this ride." Theif/else
part is an example of control flow. -
Adding a layer, we could use a
for
loop to say:for
each person in this line, checkif
they are over 4' tall. If they are, they can ride.else
tell them that they can’t ride.
-
-
-
These last two items,
Lists
andDictionaries
, are both commonly used in Python for a variety of reasons. We include them here, because they are commonly passed as input tocontrol flows
which uselogical operators
.
-
Step 4 - Functions
Functions are an incredibly powerful concept in Python. They allow for code reuse and can be combined to create easily repeatable processes in your code.
They are also one of the harder beginner concepts to understand. Remember, give yourself time and ask questions!
Our metaphor is getting a little "out there", but thinking about the height line, a function is like if the park created a tool that automatically measured each user and then displayed a red thumbs down or green thumbs up symbol as each user failed or passed the criteria.
Even in our example, you can see how this would be more flexible. It’s not a specific person that the tool is interested in, it’s just defined as a park guest and then the "function" knows to act for that guest. In this case, checking their height. The tool might allow for additional input as well. Maybe the attendant can input different heights for different rides. All of this can be done through one "check height" function. The park knows that it can use the "check height" function for any ride. Just tell it which guest’s height it’s measuring and what the measurement is and it’s good to go!
Step 5 - Reading and Working with Data
For many projects, the first step is getting data and digging in to understand more about it. Now that we know the basics of what’s possible in Python, we can start learning more about working with specific datasets.
Example: the theme park conducted a study of the new height measuring system. Now they sent us a comma separated value (CSV) file with all of the data. We need to check the data to see if it all makes sense. Are there errors in it? Did the system classify anyone incorrectly? These are all questions that we may be asking.
-
-
This isn’t the most common way to read a file in Python, but it is worth knowing about. Opening a file in this way can often help when troubleshooting difficult issues with characters or formatting.
-
-
-
Pandas is one of the most popular Python packages. There are several subsections in the documentation. Make sure you work through them all as they are important when working with data in Python.
-
-
-
Often used by other packages and for mathematical operations, Numpy is another very popular package that is worth knowing about.
-
Step 6 - Plotting Data
The next step in many people’s journey with Python is visualizing the data that they’re working with. Visualizations can help to illustrate data trends, identify outliers, indicate correlations between variables and much more.
Example: We’ve read in the data and checked it out, but the CEO wants a heatmap that shows each ride and the number of denials that the system has. They are looking to understand if there is a specific ride that appeals more to children who are then turned away and have a bad experience.
While the package documentation below if very popular, it is not the only visualization package in Python. Explore around, search |
-
-
Another one of the most popular Python packages. Has a bit of a learning curve to start but has a ton of customizability.
-
Step 7 - Learning about Modeling
Modeling includes many steps, but the overall concept is to develop an algorithm that allows you to utilize input data to make a prediction.
The goal of this roadmap is to help you build Python skills that are core to many Data Mine projects. This section is just the starting point for the next phase of your skills growth.
Skills that you’ve built to this point will all be used as you dive into predictive modeling.
Learning about building predictive models is amazing and, like Python, takes time. Check out some of the resources below to start your next steps:
Final Example: the CEO is incredibly happy with our work, but now they want to predict how often their program is going to deny letting someone ride a particular ride. That way they can try to reduce denials and make sure everyone enjoys their visit as much as possible.
The two links above are not part of the Learning about predictive analytics is a journey by itself. We just wanted to include these next steps to help you understand one way that you can apply the skills that you developed. |