- The same values can be stored in arrays with different shapes
- Array methods can perform different operations depending on the array shape
- The methods .reshape and .flatten change the shape of an array
Introducing Array Shape
Any array has a shape and the shape of an array is important for what kind of operations we can perform. Array shape is sometimes hard to imagine, even for experienced programmers so let’s just look at some code.
import numpy as np
my_array = np.array([3, 2, 5, 6, 3, 4])
my_array.shape
(6,)
my_array_reshaped = my_array.reshape((2,3))
my_array_reshaped.shape
(2, 3)
my_array_reshaped
array([[3, 2, 5],
[6, 3, 4]])
Here we create an array with 6 elements and my_array.shape tells us that these 6 elements are arranged in a single dimension that has a length of 6. We then reshape the array with its .reshape method into an array with two rows and three columns. This doesn’t look immediately useful but imagine we did an experiment under control and experimental condition with three replicates each. You’d clearly want a structure that represents this. Also, we went from a vector to a matrix with just one line of code. The most important part of array shape is that we can perform array methods only on specific dimensions. To do so we just need to pass the axis argument.
my_array = np.array([[3, 2, 5],
[6, 3, 4]])
dim0_sum = my_array.sum(axis=0)
dim0_sum
array([9, 5, 9])
dim1_sum = my_array.sum(axis=1)
dim1_sum
array([10, 13])
Remember that we start out with a (2, 3) array, 2 rows and 3 columns. When we call sum(axis=0) on that array the 0th dimension is eliminated. The array goes from a (2, 3) shape to a (3, ) shape. It does so by calculating the sum across the 0th dimension. Likewise, when we pass sum(axis=1) the 1st dimension gets eliminated in the same way and the array becomes a (2, ) array. The same concept works of course for arrays of any dimension. But lets get back to array shapes. An array cannot be converted to any shape its shape and limit the shapes it can take.
my_array = np.arange(30) # A (30,) array
my_array_reshaped = my_array.reshape((5,6))
my_array_reshaped.shape
(5, 6)
my_array_reshaped = my_array.reshape((5,7))
ValueError: cannot reshape array of size 30 into shape (5,7)
Converting from (30,) to (5, 7) didn’t work for one simple reason. 5 times 7 is 35, not 30. In other words, the new array has more elements than the original array and NumPy will not just invent new elements to make reshaping work. If the number of elements checks out, we can reshape not only to two-dimensional arrays but to any dimension.
my_array = np.arange(30) # A (30,) array
my_array_reshaped = my_array.reshape((5, 2, 3))
my_array_reshaped.shape
(5, 2, 3)
my_array_reshaped
array([[[ 0, 1, 2],
[ 3, 4, 5]],
[[ 6, 7, 8],
[ 9, 10, 11]],
[[12, 13, 14],
[15, 16, 17]],
[[18, 19, 20],
[21, 22, 23]],
[[24, 25, 26],
[27, 28, 29]]])
Of course we can also reshape from higher to lower dimensions.
my_array = np.array([[3, 2, 5],
[6, 3, 4]])
my_array_reshaped = my_array.reshape((6,))
my_array_reshaped.shape
(6,)
my_array.shape
(2, 3)
If you want combine all dimensions into one single dimension, you can use the .flatten method.
my_array = np.arange(30) # A (30,) array
my_array_reshaped = my_array.reshape((5, 2, 3))
my_array_flattened = my_array_reshaped.flatten()
my_array_flattened.shape
(30,)
Why we need array shapes
We saw how to manipulate array shape and how array methods can use the shape of an array. Lets think a bit about the real world usage of array shape. Let’s say you are working on an image processing project. You are lucky and the images are already pre-processed in a way that each image has 64 pixels in both dimensions. So each image is an array of shape (64, 64) but your dataset consists of 1000 images. So you want your dataset to be stored as a (1000, 64, 64) array. But then your image processing project becomes a volume processing project. So each volume has 100 slices. So you need a (1000, 100, 64, 64) array. But wait. You are actually working on video files. There are 20000 frames for each volume. So you need a (1000, 20000, 100, 64, 64) array. It is rare that you will have to go beyond five dimensions, but you can. In several fields it is very easy to end up with five dimensional arrays (think fMRI).
Summary
Here we learned that the shape of an array is useful to store high dimensional data meaningfully and to have array methods operate only on specific dimensions. The .reshape method is important to change the shape of an existing array and the .flatten method can collapse an array into a single dimension. In the next blog post we will learn about broadcasting. Broadcasting is a mechanisms that is triggered whenever we perform an arithmetic operation on two arrays of different shapes (dimensionality). If two arrays have identical shape the operation is performed element-wise. If they have different shapes broadcasting performs a series of transformations on the lower dimensional array to make both arrays identical in shape and finally perform the operation element-wise.