Lesson 4: Python Basics¶
The Big Picture¶
So far, we’ve been working only on the command line. However, you’ll often want to do more than the command line easily allows. Today, we’re going to be learning the basics of one of the most popular languages for doing that: Python.
Side note: If you’re familiar with Python, you many know there are two commonly-used versions, Python2 and Python3. Today, you will be using Python3, since Python2 is being deprecated at the end of this calendar year. They’re very similar, so the skills you learn today should translate to any Python program you come across.
Create a new directory called “pydir”. Enter the directory.
Python might not be installed on your workstations. In order to check: Type the following on the command line:
You should see something that looks like this:
Python 3.6.8 (default, Oct 7 2019, 12:59:55) [GCC 8.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>>
Type exit() or quit(), and move on.
Need to install python? Follow these steps:- Go to https://www.python.org/downloads/ - Download the latest version of Python for your OS - Open the installer and follow the installation steps
How to Python¶
We’re going to start by taking a quick look at how Python code is written. Create a file called Hello.py. The “.py” extension signals that you will be writing in python. Inside the file put:
Great! Now save + close the file, and run your newly-written program by typing the following on the command line:
Because the “print” statement in Python outputs whatever follows it to the command line, you’ll see your program print “Hello World”. That was pretty trivial… let’s try something more interesting.
You’ll need to obtain a file
wget --no-check-certificate 'https://docs.google.com/uc?export=download&id=1MOVGIAwde3Op4kApbn3Hw_C8_WETFt3Y' -O syntax.py
Now, open the syntax.py file and take a look inside.
Make sure you understand what’s happening. Follow the comments closely, and ask one of us if you have any questions.
Now, run the code the same way we ran “Hello.py”. You should see “Bioinformatics is Cool”. Can you edit line 14 to make the program print: “is Bioinformatics Cool”?
Indentation in Python matters. Try adding a second print statement your “Hello.py” file so it looks like this:
print("Hello World") print("Indented line")
Now, try to run “Hello.py”. Python will complain that there’s a problem with your indentation (there was no need to indent, but you did anyways). You’ll learn more about when to indent in the next section. Speaking of which, it’s about time for some bioinformatics.
Say I have a fasta file containing genetic information (what is a fasta file?). Inside are some number of reads (remember reads from our last lesson?). I want to write a Python program that ONLY outputs the header lines (the ones that start with “>”). How can I do it?
Thought about it? Click me to confirm your answer!
-check every line- -if it starts with a ">"- -print the line-
This is one simple representation of how you could achieve this task. The implementation in Python, as we shall see, uses a loop. Which one of the three pseudocode lines above suggests we will need a loop?
Let’s get a sample fasta file. Use the following command to download it straight in your working directory:
wget --no-check-certificate 'https://docs.google.com/uc?export=download&id=17NqX2e5jA9Jko-gV9lD5Bbnj9m2beKCJ' -O gencode.vM17.IncRNA_subsampled.fasta
Check the contents of your directory. You should see a file called “test.fasta”.
Let’s make a Python program that reads from this file. Create a new file called “Loop.py” and add this as the first line:
file = open("test.fasta", "r")
This will open the file (test.fasta) for “r”eading, and give you access to test.fasta in a variable called “file”. Now let’s use a loop to look at every line in the file:
file = open("test.fasta", "r") for line in file:
That last line is the syntax for starting a for loop in Python. Next, looking back at our pseudocode, we see that we need to check if a line starts with a “>”. Luckily, lines in a file are stored as strings! Remember that strings are indexed, meaning individual characters from them can be extracted using brackets (you might remember we did something similar above with!).
file = open("test.fasta", "r") for line in file: if line == ">":
Pay close attention to the indentation here. You can think of everything that’s indented after the
for as being “inside” of the for loop (it looks a bit like that too!). In our example, that means the
if code is executed for every line in the file.
Finally, we just specify that we want the line printed if the line does start with “>”. We indent the next line so Python knows it’s part of the “if” statement, and…
file = open("test.fasta", "r") for line in file: if line == ">": print(line)
Run Loop.py and see what happens. Voila.