- The array is the central NumPy object
- Pass any sequence type object to the np.array() constructor to create an array
- Use functions like np.zeros, np.arange and np.linspace to create arrays
- Use np.random to create arrays with randomly generated values
NumPy is a Python package for numerical computing. Python is not specifically designed to deal with large amounts of data but NumPy can make data analysis both more efficient and readable than it is with pure Python. Without NumPy, we would simply store numbers in a list and perform operations on those numbers by looping through the list. NumPy brings us an object called the array, which is essential to anything data related in Python and most other data analysis packages in one way or another build on the NumPy array. Here we will learn several ways to create NumPy arrays but first let’s talk about installing NumPy.
Setting up NumPy
I highly recommend installing Python with a data science platform such as https://www.anaconda.com/ that comes with NumPy and other science critical packages.
To find out if you already have NumPy installed with your distribution try to import it
import numpy as np
If that does not work, try to install NumPy with the package installer for Python (pip) by going to your commdand line. There try:
pip install numpy
Finally, you can take a look at the docs for installation instructions. https://scipy.org/install.html
Three ways to create arrays
Now let’s create our first array. An array is a sequence of numbers so we can convert any Python sequence to an array. One of the most commonly used Python sequence is the list. To convert a Python list to an array we simply pass a list to the numpy array constructor
import numpy as np my_list = [4, 2, 7, 9] my_array = np.array(my_list)
This creates a NumPy array with the entries 4, 2, 7, 9. We can do the same with a tuple.
my_tuple = (4, 2, 7, 9) my_array = np.array(my_list)
Of course we can also convert nested sequences to arrays and it works exactly the same way.
my_nested_list = [[4, 2, 7, 9], [3, 2, 5, 8]] my_array = np.array(my_nested_list)
This is the first way to create arrays. Pass a sequence to the np.array constructor. The second way is to use numpy functions to create arrays. One such function is np.zeros.
zeros = np.zeros((3, 4)) np.array([[0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.]])
np. zeros gives us an array where each entry is 0 and it requires one argument: the shape of the array we want to get from it. Here we got an array with three rows and four columns, because we pass it the tuple (3, 4). This function is useful if you know how many values you need (the structure) but you do not know which values should be in there yet. So you can pre-initialize an all zero array and then assign the actual values to the array as you compute them. Another array creation function is called np.arange
arange= np.arange(5, 30, 2) arange array([ 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29])
np.arange gives us a sequence starting at 5, stopping at 30 (not including 30) and going in steps of 2. This function is very useful to generate sequences that can be used to index into another array. We will learn more about indexing in a future blog post. A function very similar to np.arange is np.linspace.
linspace = np.linspace(5, 29, 13) linspace array([ 5., 7., 9., 11., 13., 15., 17., 19., 21., 23., 25., 27., 29.])
Instead of taking the step size between values, linspace takes the number of entries in the output array. Also, the final value is inclusive (29 is the final value). Finally the third way to generate numpy arrays is with the np.random module. First, lets look at np.random.randint
randint = np.random.randint(5, 30, (3,4)) array([[26, 17, 26, 24], [20, 16, 29, 25], [25, 21, 26, 26]])
This creates an array containing random integers between 5 and 30 (non-inclusive) with 3 rows and 4 columns. If you try this code at home the values of your array will (most probably) look different but the shape should be the same. Finally lets look at np.random.randn
randn = np.random.randn(4,5) # Random numbers from normal distribution randn array([[-2.34229894, -1.43985814, -0.51260701, -2.58213476, 1.61196437], [-0.69767456, -0.0950676 , -0.22415381, -0.90219875, 0.33513859], [ 0.56432586, -1.62877834, -0.60056852, 1.37310251, -1.20494281], [-0.20589457, 1.34870661, -0.89139339, -0.40300812, -0.15703367]])
np.random.randn gives us an array with numbers randomly drawn from the standard normal distribution, a gaussian distribution with mean of 0 and variance 1. Each argument we pass to the function creates another dimension. In this case we get 4 rows and 5 columns.
We learned how to create arrays, the central NumPy object. Working with NumPy means to work with arrays and now that we know how to create them we are well prepared to get working. In the next blog post we will take a look at some of the basic arithmetic functions we can perform on arrays and show that they are both more efficient and readable than Python builtin functions.
One thought on “Getting Started with NumPy”
[…] Lets start with the interactive console in the lower right. Here, IPython is running and it is awaiting your commands. You can type code and run it by hitting enter. You can use it like a fancy calculator. Try 2+2, 2-2, 2*2, 2/2, 2**2. It’s all there. The interactive console is the perfect place to try out commands and see what they do. We can define variables here and import packages. Luckily we installed Anaconda, so we have NumPy already available. The conventional way to import NumPy is import numpy as np. To learn all about NumPy, find my NumPy blog series. […]