To store an object, we assign it a name using the equal sign
Names must not start with a number and they cannot contain special characters (except for underscores)
When you choose names, be consistent and descriptive
You have probably heard of variables. In mathematics, a variable is a placeholder for something that is not fully defined just yet. In most programming languages, we say: “We define the variable”. In Python we rarely talk about variables. We instead talk about names. We would say: “We assign a name”. Specifically, we assign a name to an object. To do so, we use the equal sign. While the vocabulary is slightly different, the result is very similar in all programming languages. Let’s look at an example.
my_name = 'Daniel'
Here we assign the name my_name to the string object 'Daniel'. From now on we can use the name to refer to the object. We can perform operations on the name and Python will replace the name with the object it refers to.
The multiplication operator applited to a string concatenates that array multiple times. Here we get the same string five times. Note, that we use the name, instead of the literal object. We can also reassign the same name to a different object.
First, my_name refers to 'Daniel', then it refers to the result of the operation my_name * 5, which is 'DanielDanielDanielDanielDaniel'. When we choose a name, there are very few rules that we must follow. However, there are many more rules that we definitely should follow. We take a look at the mandatory rules first.
When we choose a name, there are many things to consider. But there are also some things that we are not allowed to do. For example a name cannot start with a number.
Besides the visually structuring names, underscores also have special meaning when leading a name. For example, all objects in Python have special double underscore methods. We will learn all about them later.
Good Naming Practices
If you are just getting started with Python, you probably have a lot of things on your mind. Actually, if you follow the above rules, you can choose any name you want. However, it is very important to learn good naming practices in Python. Rule number one is consistency. Avoid wildly different ways to visually structure names within the same project. Don’t start with my_name and end with MyName. If you stay consistent within your projectsyou are already ahead of the curve. For beginners I recommend making names all lower case and structuring only using underscores. Also avoid numbers. Write name_one instead of name_1. Also, don’t be afraid of making names long. It is more important names are descriptive than short. length_of_segment is much better than los, even if there is more typing involved. Don’t be afraid to type. Be afraid of coming back to your code one year after you wrote it and trying to read completely unreadable, meaningless names.
Those are enough rules for now. If you want to adopt a perfect style right from the beginning you should read the official style guide for python called PEP8. There are generally no laws about Python style but this is the most authoritative document on coding style in Python and many people reading your code will expect you to follow the guidelines in this document. If you follow the few rules I describe above, you are already following the most important practices regarding naming.
To store objects we assign them a name. We can do that with the equal sign. The name is on the left, the object is on the right. name = object Needless to say, we’ll be spending a lot of time assigning names. Once a name is assigned, we can use the name instead of the object. Python will replace the name with the object in any operation. There are very few mandatory rules when choosing a name. For one, names cannot start with a number. Second, special characters, except for the underscore (_) are forbidden in names. There are some more rules that are good to follow. The most important one is to be consistent. Do not arbitrarily change your style within a project. If you do this you are already doing well.
We can use Python as a basic calculator for addition, subtraction, multiplication, division and exponent
We can use the equal sign to save results
The two major numeric types are floating point (float) and integer (int)
Python as a Calculator
Computers are good at math and so is Python. We can use Python like a calculator with the standard arithmetic operators for addition (+), subtraction (-), multiplication (*), division (/) and exponent (**).
When we combine multiple operators on one line they are executed in the order we know from school. First comes the exponent. Then multiplication and division. Finally addition and subtraction.
Parentheses can do many different things in Python, which we will talk about later. When used with arithmetic operators they change the order of operation. Whatever appears in parentheses is resolved first. Many calculators have a basic memory function and we can easily save results in Python.
To save results of mathematical operations we can use the equal (=) operator. On the left we put a name that we want to assign the result to and on the right we put the operation that we want to save.
save = 5 + 7
In this example we assign the result 12 to the name save. Once we save a result under a name we can reuse it in an arithmetic expression. If a name stores a number it will just be evaluated as a number when it appears with a mathematical operator.
save = 5 + 7
10 + save
The equal operator is essential in Python. Here we are using it to save the result of a mathematical operation. Later we will learn that it is used to assign any kind of object to a name. A name is what allows us to refer to an object. Objects are very important in Python. We will learn all about that later but even in the seemingly simple realm of numbers there are different kinds of objects
Floating Point Numbers and Integers
You may have noticed that sometimes numbers have a dot as a decimal delimiter and some numbers don’t. That is because numbers can be floating point numbers (float for short) or integers (int for short). Those are different numerical types. What kind of number we get depends on the arithmetic operation used and the types of numbers involved. To create a specific type of number we can simply add the dot (.) to the number. To check what kind of object we created we can use the type() function.
So 3.0 generates a float and 3 generates an integer. In Python3 (the Python you should be using!) the division operator (/) always generates a floating point number, regardless of the numbers involved.
div = 4 / 2
In this case, both numbers are integers (four and two) and four can be divided by two without remainder. Division returns a float anyway. With multiplication, addition and subtraction, the situation is slightly more complicated. The type of the result depends on the type of the numbers involved. If one of the numbers is a float, the result will be float.
For now this seems like a pointless excercise. Mathematically there is no difference between 8.0 and 8. Later we will learn why the types of objects matters. Even the difference between a floating eight and an integer eight can be important. For example, when we index into a list, the number we use as an index must be an integer.
We learned about the most important mathematical operators, addition, subtraction, multiplication, division and exponent. When those operators appear together, operator precedence decides which operation is performed first. The precedence order is the same we learned in school. We can use parentheses to change the order of operation. Operations in parentheses are executed earlier. Furthermore, we learned that we can save results by assigning a name to them. We can assign a name with the equal operator. Finally, we learned that there are different numeric types. There are integers and floating point numbers. Why the difference between them matters will become more clear later.
Install Python as part of a data science platform such as Anaconda
Use an integrated development environment like Spyder (Installs with Anaconda)
Don’t reinvent the wheel, find out what already exists in your field
Python and Science
Python has been adopted with open arms by many scientific fields. A lot of its success is directly linked to the success of NumPy (the Python package for numerical computing) and the scientific Python community (SciPy community). If you want to understand the history of this success story, I recommend reading the recent nature methods paper which gives some strong background about the SciPy community. However, you don’t need to read the paper to use Python or any of the scientific packages.
A downside of Python is that it is a general purpose programming language. Python is used for anything you can think of. Web development, GUI design, game development, data mining, machine learning, computervision and so on. This can be intimidating to beginners. Especially to scientists who have a very specific task they want to automate. Compare this situation to MATLAB. Its purpose is very specific and is even part of the name. It is the MATrix LABoratory. It deals very efficiently with matrices and is generally good at math stuff. Both very valuable and important to most scientists. MATLAB is specifically designed to serve the science and engineering communities. Python on the other hand is for everyone, which can be a weakness in the beginning but can quickly turn into its greatest strength. There are no privileged tasks in Python, they are all handled well (if there is a great community behind the task maintaining it). The purpose of this guide is to ease some of the hurdles scientists face when diving into Python and highlight the many advantages.
The first problem we face is called package management. When we download pure Python from the official website (python.org), we get the core Python functionality. Python in itself does not include most of the features we depend on as scientists. Pythons functionality is organized in so called packages that we need to install if they are not built into Python. Alternatively we can install a pre-made Python distribution that includes scientific packages. This is my preferred way to installing Python and I strongly recommend beginners to start with Anaconda.
Installing Python with Anaconda
Anaconda is a data science platform that installs Python with most of the packages we need as scientists. It is open-source and completely free. When we install Anaconda we get lots of stuff. Most importantly we get Python, NumPy, Matplotlib, Conda and Spyder. We could install those things ourselves but this way we can have them without ever worrying about package management. We already talked about core Python and NumPy. Matplotlib is a package that allows us to plot our data. Conda is a package manager that allows us to update installed packages and install new packages. Finally, Spyder is an integrated development environment (IDE). Spyder stands for Scientific PYthon Development EnviRonment and it is exactly what we need to get started.
An IDE is extremely useful. For example, it helps us by highlighting important parts of our code visually. By running Spyder you have just setup your own IDE. This is your kingdom now. You will be able to do amazing things here and you should celebrate this moment properly. However, as you celebrate, my duty will be to guide you through the most important parts of this graphical user interface.
Lets start with the interactive console in the lower right. Here, IPython is running and it is awaiting your commands. You can type code and run it by hitting enter. You can use it like a fancy calculator. Try 2+2, 2-2, 2*2, 2/2, 2**2. It’s all there. The interactive console is the perfect place to try out commands and see what they do. We can define variables here and import packages. Luckily we installed Anaconda, so we have NumPy already available. The conventional way to import NumPy is import numpy as np. To learn all about NumPy, find my NumPy blog series.
On the left is the editor. This is our code notebook. Here we edit files that store our code. Those files are commonly called scripts. When you hit enter here, code is not immediately executed. You are just moving to a new line. To execute the code here we can hit the green arrow (play button) above. When we do so, all lines of code in the script are executed from top to bottom. When the script finishes, the objects created in the script still exist in the interactive console. This is very useful to debug the script. We can use the interactive console to take a close look at what the script did.
The Power of Scripts
Most of your work will be about writing scripts that do something useful. Many times, there are other ways to solve the same task. Many things can be done by hand, clicking through the graphical user interface of another program. Sometimes this is still the best way, but there are many advantages to writing a script.
A script can be much faster than a human, especially on repeat tasks. When we have to rename 10 files in a certain way, we might decide that it is not worth to write a script. Let’s just do it quickly by hand. But the same task could come back later. This time with 100 files that require renaming. The most important thing is to pick your battles. We cannot automate everything. We will have to make tough decisions.
Another advantage of scripts is that they are a protocol of the analysis. If we can write a script that takes care of everything and is capable of analyzing the entire dataset, we know exactly how the analysis was performed. When we analyze manually we can achieve the same by very carefully writing down everything we do on each data point. However, this can easily fail if something about the analysis becomes important afterwards and it is not part of the protocol. Scripts are also more easily shared than a protocol because the script language (in our case Python) is less ambiguous.
Sometimes even the script fails as a definitive protocol. This is the case when we use packages that change functions. Lets assume we use a function called fastAnalysis from a fictional package called fancyCalc. The way fastAnalysis works changed between version 1.1 and 1.2 slightly. We only know what our script did if we remember which version we used at the time we ran the script. Here another advantage of scripts shows. We can pretty quickly run the same analysis again with different fancyCalc versions. If we get the same result as before this version is probably the one we ran previously. Manual analysis is usually much harder to repeat. Imagine you spent the last 2 month analyzing 200 samples. You had to normalize each sample for its corresponding baseline. How easy would it be to do the same analysis by hand with a slightly larger baseline interval? In a script this would usually involve changing a single number. By hand it could easily lead to another 2 months of work.
The Scripting Workflow
We talked at length about the advantages of scripts but how do we actually write one? For starters we need to know what we want to do. Then we need to look online for a function that we hope can achieve our goal. Then we try out the function. If it does what we want we move on to the next task. Usually we have one pretty large, intimidating task, such as “analyze this whole data set”. Then this task falls into smaller tasks. For example we will have to load our data, do preprocessing, extract regions of interest, quantify and so on. Those tasks fall apart into even smaller tasks. For preprocessing one of those tasks might be to subtract the background. Ideally, if we make our tasks small enough we will be less intimidated and we will find a function that performs this task.
In the age of the internet, programming is easy enough. Unfortunately, to become effective at it we will need to do some learning and get practice. Our goal is to become effective scripters. So we will need to learn three things: 1. We need to learn some basic Python. This will be easy, I promise. 2. We need to learn how to think computationally about our tasks. Only then will be be able to effectively divide entire projects into smaller, manageable tasks. 3. We will learn how to find, evaluate and use task specific packages quickly. Most tasks have already been solved by other people and we never want to reinvent the wheel (unless we have a vision for a really cool wheel that is much better than all the other wheels that already exist). My blog will lead you through all three steps.
The best thing you can do next is to start coding. Today. The amount of different programming languages, distributions of those languages, integrated development environments and resources to learn all of those is immense and can lead to real decision paralysis. You can spend forever trying to decide how to start but you already have everything you need. Install Anaconda, launch Spyder and just let lose. If you are not sure if Anaconda is right for you, install a distribution you think is more suitable. Install pure Python if you are feeling adventurous and just start hacking away. If you are not convinced Python is the best language for you, install another language. If you don’t want to learn with this blog, use another resource. There are so many great ones. Just do it. So now you need to get your hands dirty.
In future blog posts I will go through the three points I think are necessary to become effective scripters. I will start with basic Python and I will collect those blog posts here. I’d be happy if you want to join me.