Doing the Math with Python

  • We can use Python as a basic calculator for addition, subtraction, multiplication, division and exponent
  • We can use the equal sign to save results
  • The two major numeric types are floating point (float) and integer (int)

Python as a Calculator

Computers are good at math and so is Python. We can use Python like a calculator with the standard arithmetic operators for addition (+), subtraction (-), multiplication (*), division (/) and exponent (**).

An IPython console running in Spyder. Shown are the standard arithmetic operators for addition, subtraction, multiplication, division and exponent.

When we combine multiple operators on one line they are executed in the order we know from school. First comes the exponent. Then multiplication and division. Finally addition and subtraction.

2 ** 2 * 5  # Exponent precedes multiplication
# 20
2 ** 2 / 5  # Exponent precedes division
# 0.8
2 * 2 + 5  # Multiplication precedes addition
# 9
2 * 2 - 5  # Multiplication precedes subtraction
# -1
2 / 2 + 5  # Division precedes addition
# 6.0
2 / 2 - 5  # Division precedes subtraction
# 0.2857142857142857

If we want to change the order of operation, we can use parentheses. Let’s use them to see how the results above would look if the order of operation was reversed.

2 ** (2 * 5)
# 1024
2 ** (2 / 5)
# 1.3195079107728942
2 * (2 + 5)
# 14
2 * (2 - 5)
# -6
2 / (2 + 5)
# 0.2857142857142857
2 / (2 - 5)
# -0.6666666666666666

Parentheses can do many different things in Python, which we will talk about later. When used with arithmetic operators they change the order of operation. Whatever appears in parentheses is resolved first. Many calculators have a basic memory function and we can easily save results in Python.

Saving Results

To save results of mathematical operations we can use the equal (=) operator. On the left we put a name that we want to assign the result to and on the right we put the operation that we want to save.

save = 5 + 7
save
# 12

In this example we assign the result 12 to the name save. Once we save a result under a name we can reuse it in an arithmetic expression. If a name stores a number it will just be evaluated as a number when it appears with a mathematical operator.

save = 5 + 7
10 + save
# 22

The equal operator is essential in Python. Here we are using it to save the result of a mathematical operation. Later we will learn that it is used to assign any kind of object to a name. A name is what allows us to refer to an object. Objects are very important in Python. We will learn all about that later but even in the seemingly simple realm of numbers there are different kinds of objects

Floating Point Numbers and Integers

You may have noticed that sometimes numbers have a dot as a decimal delimiter and some numbers don’t. That is because numbers can be floating point numbers (float for short) or integers (int for short). Those are different numerical types. What kind of number we get depends on the arithmetic operation used and the types of numbers involved. To create a specific type of number we can simply add the dot (.) to the number. To check what kind of object we created we can use the type() function.

integer = 3
type(integer)
# int
floating = 3.0
type(floating)
# float

So 3.0 generates a float and 3 generates an integer. In Python3 (the Python you should be using!) the division operator (/) always generates a floating point number, regardless of the numbers involved.

div = 4 / 2
div
# 2.0
type(div)
# float

In this case, both numbers are integers (four and two) and four can be divided by two without remainder. Division returns a float anyway. With multiplication, addition and subtraction, the situation is slightly more complicated. The type of the result depends on the type of the numbers involved. If one of the numbers is a float, the result will be float.

integer = 4 * 2
integer
# 8
type(integer)
# int
floating = 4 * 2.0
floating
# 8.0
type(floating)
# float

For now this seems like a pointless excercise. Mathematically there is no difference between 8.0 and 8. Later we will learn why the types of objects matters. Even the difference between a floating eight and an integer eight can be important. For example, when we index into a list, the number we use as an index must be an integer.

Summary

We learned about the most important mathematical operators, addition, subtraction, multiplication, division and exponent. When those operators appear together, operator precedence decides which operation is performed first. The precedence order is the same we learned in school. We can use parentheses to change the order of operation. Operations in parentheses are executed earlier. Furthermore, we learned that we can save results by assigning a name to them. We can assign a name with the equal operator. Finally, we learned that there are different numeric types. There are integers and floating point numbers. Why the difference between them matters will become more clear later.

NumPy Array Data Type

  • Any array has a data type (dtype)
  • The dtype determines what kind of data is stored in the array
  • Not all operations work for all dtypes

Introduction to Data Types

Having a data type (dtype) is one of the key features that distinguishes NumPy arrays from lists. In lists, the types of elements can be mixed. One index of a list can contain an integer, another can contain a string. This is not the case for arrays. In an array, each element must be of the same type. This gives the array some of its efficiency, because operations can know in advance, what kind of data they will find in each element simply by looking up the data type. At the same time it makes arrays slightly less flexible, because some operations are undefined for some data types and we cannot assign any kind of data to an array. But how does NumPy decide what data type an array should have in the first place?

Guessing or Defining the dtype

So far we were able to create arrays effortlessly without knowing what dtype even means. That is because NumPy will just take a guess, what the dtype should be, based on the input it gets for the array.

arr = np.array([4, 3, 2])
arr.dtype
# dtype('int32')
arr = np.array([4, 3.0, 2])
arr.dtype
# dtype('float64')
arr = np.array([4, '3', 2])
arr.dtype
# dtype('<U11')

In the first case, each element of the list we pass to the array constructor is an integer. Therefore, NumPy decides that the dtype should be integer (32 bit integer to be precise). In the second case, one of the elements (3.0) is a floating-point number. Floats are a more complex data type in Python, which means that all other data types have to follow the more complex one. Therefore, all elements of the array are converted to floats and are stored with the dtype float64. Strings are an even more complex dtype. Because ‘3’ is a string in the final example, the dtype becomes ‘<U11’. U stands for unicode, a type of string encoding and the number indicates the length of the string. In all three cases NumPy guesses the dtype according to the content of the list. This works well most of the time but we can also explicitly define the dtype.

arr = np.array([4, 3, 2], dtype=np.float)
arr.dtype
# dtype('float64')
arr = np.array([4, 3, 2], dtype=np.str)
arr.dtype
# dtype('<U1')
arr = np.array([4, 3, 2], dtype=np.bool)
arr.dtype
# dtype('bool')

Converting arrays to other dtypes can be necessary because some operations will not work on arrays of mixed types. A dtype that is particularly problematic is the np.object dtype. It is the most flexible dtype but it can cause a lot of problems for both experts and beginners.

np.object and the Curse of Flexibility

Most dtypes are very specific. They let you know if the array contains a number (np.int, np.float) or a string (all unicode ‘U’ dtypes). Not so much np.object. It tells you that whatever is inside the array is a thing. Because everything is an object anyway. This can make an array as flexible as a list. Anything can be stored. That is also where the problems come in.

arr = np.array([[3,2,1],[2,5]])
arr.dtype
# dtype('O')  # 'O' means object
arr + 5
# TypeError: can only concatenate list (not "int") to list

Suddenly, the plus operation between an array and a scalar fails. What went wrong? Starting from the top, NumPy decides to assign the dtype of np.object to arr because the nested list entries have different lengths. Think of it this way: this array can neither be a (2, 3) nor a(2, 2) array of dtype integer. Therefore, NumPy makes it a (2,) array of dtype object. So the array contains two lists, the first one is of length 3 and the second one of length 2. NumPy generally turns anything that is more complex than a string into np.object. A list is one of those that gets turned into np.object. The error then occurs because the plus operation is not defined for a list with an integer. But that also means, that the operation will work, if the objects contained in the array so happen to work with the operation.

arr = np.array([3,2,1], dtype=np.object)
arr.dtype
dtype('O')
arr + 5
array([8, 7, 6], dtype=object)

This is one of the main problem of the np.object dtype. Operations work only sometimes and to know if an operation will work, each element has to be checked. With other dtypes, we know which operations will work just by just looking at it.

Summary

The dtype is one of the concepts that is closely related to the internal workings of NumPy. There is a lot that could be said about the details but effective beginners only need to remember a few points. First, the dtype determines what is stored in the array. All elements of an array have to conform to a specific type and dtype tells us which one. Second, NumPy guesses the dtype based on the literal data unless we specify which dtype we want. Guessing works most of the time but sometimes explicit types conversion is necessary. Third, operations that we know and love from numeric types (np.int, np.float) may not work on other types (np.str, np.obect). This is particularly annoying for beginners. If you have hard to debug errors, find out what dtype your arrays actually have.