Threshold Detection in NumPy

Many signals are easily detected by their size. We will learn how to detect the indices where signals cross a threshold with NumPy. These are our practice signals.

import numpy as np
import matplotlib.pyplot as plt
data = np.array([0, 0, 0, 5, 5, 5, 5, 0, 0, 0, 0, 4, 4, 4, 0, 0, 0])
plt.plot(data, marker='o')

We will perform two simple steps to detect the threshold crossings: 1. Make the data binary, in a way that they are true when larger than the threshold and false when lower or equal. 2. Take the difference of the binary signal. This gives us a boolean array that is true when the threshold was crossed. We can combine those steps into one line.

threshold = 2
threshold_crossings = np.diff(data > threshold, prepend=False)

Plotting shows us that threshold_crossings is true after the threshold was crossed.

plt.plot(data2, marker='o')
plt.plot(thr_crossings, marker='o')
plt.legend(("Data", "Threshold Crossings"))

To get the indices of the threshold crossings we can use np.argwhere(), which returns the true indices from a boolean array.

np.argwhere(threshold_crossings)[:,0]
# array([ 3,  7, 11, 14], dtype=int64)

Threshold crossings occur at 3, 7, 11 and 14. Sometimes we only need the upward or downward crossings. We can simply isolate those by slicing the indiced array.

np.argwhere(threshold_crossings)[::2,0]  # Upward crossings
# array([ 3, 11], dtype=int64)
np.argwhere(thr_crossings)[1::2,0]  # Downward crossings
# array([ 7, 14], dtype=int64)

Sometimes we want to find the point before the threshold is crossed, rather than after. There is one simple trick in np.diff instead of setting prepend=False, we set append=False.

threshold = 2
post_threshold_crossings = np.diff(data > threshold, prepend=False)
pre_threshold_crossings = np.diff(data > threshold, append=False)
plt.plot(data2, marker='o')
plt.plot(post_threshold_crossings, marker ='o')
plt.plot(pre_threshold_crossings, marker='o')
plt.legend(("Data", "Post Crossings", "Pre Crossings"))

Make sure to check out the documentation for np.diff and bonus points if you can figure out why exactly this works. Detecting threshold crossings is an easy but important part in most of my analysis pipelines and now you can do it too.

Comparisons and Logic Functions in NumPy

  • We can compare arrays with scalars and other arrays using Pythons standard comparison operators
  • There are two different boolean representations of an array. arr.all is True if all elements are True, arr.any is True if any element is True.
  • To perform logical functions we need the special NumPy functions np.logical_and, np.logical_or, np.logical_not, np.logical_xor

Introduction

Logic functions allow us to check if logical statements about our arrays are true or false. Luckily, logic functions are very consistent with other array functions. They are performed element-wise by default and they can be performed on specific axes.

Comparing Arrays and Scalars

The same way we can use the arithmetic operators we can use all the logical operators: >, >=, <,< <=, ==, !=,

import numpy as np
arr = np.array([-1, 0, 1])
arr > 0
# array([False, False,  True])
arr >= 0
array([False,  True,  True])
arr < 0
# array([ True, False, False])
arr <= 0
# array([ True,  True, False])
arr == 0
# array([False,  True, False])
arr != 0
# array([ True, False,  True])

Comparing Arrays with Arrays

The result of a logic operation is a boolean array containing binary values of True or False. This particular case shows us logic operations of arrays with scalars. All boolean operations are performed element-wise, so each element is compared against the scalar. We can also compare arrays to arrays if their shape allows it.

arr_one = np.array([-1, 0, 1])
arr_two = np.array([1, 0, -1])
arr_one > arr_two
# array([False, False,  True])
arr_one >= arr_two
# array([False,  True,  True])
arr_one < arr_two
# array([ True, False, False])
arr_one <= arr_two
# array([ True,  True, False])
arr_one == arr_two
# array([False,  True, False])
arr_one != arr_two
array([ True, False,  True])

Truth Value of Arrays and Elements

Sometimes we need to check if an array contains any elements that are considered True in a boolean context. While the boolean value of array elements is well defined, the truth value of an entire array is not defined.

arr = np.array([0, 1])
bool(arr[0])
# False
bool(arr[1])
# True
bool(arr)
# ValueError: The truth value of an array with more 
# than one element is ambiguous. Use a.any() or a.all()

NumPy error messages are great. This one is so great that it even tells us which method we need to use to get at the truth value of an array. We can use arr.any() to find out if any of the elements evaluate to True or arr.all() to find out if all elements are True

arr_one = np.array([0, 0])
arr_one.any()
# False
arr_one.all()
# False
arr_two = np.array([0, 1])
arr_two.any()
# True
arr_two.all()
# False
arr_three = np.array([1, 1])
arr_three.any()
# True
arr_three.any()
# True

This can be useful to find out whether an array is empty

arr = np.array([])
arr.any()
# False

Logical Operations

Finally, we need to look at four more logical operations: and, or, not & xor.
Unfortunately we can’t just use the Python keywords. The reason is in the error message above: “ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()”. It is ambiguous because NumPy does not know if we want to perform the operation element-wise or if we want to perform the operation on the truth value of the array. NumPy does not try to guess which one we mean, so it throws the error. To get these logical functions we need to call some more explicit NumPy functions.

arr_one = np.array([0,1,1])
arr_two = np.array([0,0,1])
np.logical_and(arr_one, arr_two)
# array([False, False,  True])
np.logical_or(arr_one, arr_two)
# array([False,  True,  True])
np.logical_not(arr_one)
# array([ True, False, False])
np.logical_xor(arr_one, arr_two)
# array([False,  True, False])

Summary

We are now well equipped to deal with arrays. We can compare arrays with scalar values and other arrays using the the standard comparison operators. We can also perform logical operations on arrays with the special NumPy functions (logical_and, logical_or, logical_not and logcal_xor). Finally we can get two different boolean values of an arrays using arr.all and arr.any.

Array Indexing with NumPy

  • Indexing is used to retrieve or change elements of a an array
  • Slice syntax (start:stop:step) gets a range of elements
  • Integer and boolean arrays can get an arbitrary set of elements

Introduction to Array Indexing

Indexing is an important feature that allows us to retrieve and reassign specific parts on an array. You probably already know the basics of indexing from Python lists and tuples. You can index into NumPy arrays the same way you index into those sequences but NumPy indexing comes with many extra features we will learn about here. First, lets look at single value indexing.

Single Value Indexing

We can use indexing to get single (scalar) values from an array. Indexing is always done with square brackets and we always start counting at 0.

import numpy as np
arr = np.arange(10,15)
arr
array([10, 11, 12, 13, 14])
arr[0]
10
arr[4]
14

Note that single value indexing does not return an array with a single entry but rather a numpy integer. To get a single value from a multi dimensional array we need to use multiple indices that are separated by commas.

arr = np.arange(20)
arr = arr.reshape((2,2,5))
arr
array([[[ 0,  1,  2,  3,  4],
        [ 5,  6,  7,  8,  9]],

       [[10, 11, 12, 13, 14],
        [15, 16, 17, 18, 19]]])
arr[0,0,1]
1
arr[1,0,4]
14

I recommend this way of indexing but you can also use multiple square brackets like you would for Python sequences.

arr
array([[[ 0,  1,  2,  3,  4],
        [ 5,  6,  7,  8,  9]],

       [[10, 11, 12, 13, 14],
        [15, 16, 17, 18, 19]]])
arr[0][0][0]
0
arr[1][0][4]
14

We can also use indexing to reassign elements of an array.

arr = np.arange(10,15)
arr
# array([10, 11, 12, 13, 14])
arr[1] = 20
arr
# array([10, 20, 12, 13, 14])

Slice Indexing

To retrieve a single value, our indices need to resolve all dimensions of the array and arrive at a single value. Whenever one dimension remains unspecified, we get an array (array view technically).

arr = np.array([[[ 0,  1,  2,  3,  4],
                 [ 5,  6,  7,  8,  9]],
                [[10, 11, 12, 13, 14],
                 [15, 16, 17, 18, 19]]])
arr[0, 1]
array([5, 6, 7, 8, 9])
arr[1]
array([[10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19]])

To take an entire dimension we can use the colon.

arr = np.array([[[ 0,  1,  2,  3,  4],
                 [ 5,  6,  7,  8,  9]],
                [[10, 11, 12, 13, 14],
                 [15, 16, 17, 18, 19]]])
arr[0, :, 0]
array([0, 5])
arr[:, 0, 0]
array([ 0, 10])

The colon is very useful for indexing in general, because it allows us to take a slice of values instead of a single value. The syntax of the slice follows start:stop:step. If we leave out start, the slice starts at 0. If we leave out stop, it goes to the end of the dimension. If we leave out step, the step defaults to 1.

arr = np.array([[[ 0,  1,  2,  3,  4],
                 [ 5,  6,  7,  8,  9]],
                [[10, 11, 12, 13, 14],
                 [15, 16, 17, 18, 19]]])
arr[0, 0, 1:5:2]
# array([1, 3])
arr[0, 0, 1:4]
# array([1, 2, 3])
arr[0, 0, 1:]
# array([1, 2, 3, 4])
arr[0, 0, :3]
# array([0, 1, 2])
arr[0, 0, :]
# array([0, 1, 2, 3, 4])

Index Array

So far we learned that we can use integers and slices for indexing. Now we learn that we can also use arrays to index into an array. When we use an array to index, that array has to either contain integers or boolean values. Lets take a look at integer array indexing first.

arr = np.arange(10,50,3)
idc = np.arange(5)
idc.dtype
dtype('int32')
arr[idc]
array([10, 13, 16, 19, 22])
idc = np.arange(5,8)
arr[idc]
array([25, 28, 31])
idc = np.array([1,2,4])
arr[idc]
array([13, 16, 22])

Note that in the examples where we generate index arrays with arange, we could achieve the same result with a slice as shown above and save one line of code. Integer arrays are most useful when they are generated by a process that is more complicated than the arange method. One example is the np.argwhere method we will learn more about in a later post.

Boolean Array

Boolean arrays also deserve at least one post of their own but here I will give you a teaser. We only want to retrieve those values, that satisfy a larger than condition.

arr = np.array([[[ 0,  1,  2,  3,  4],
                 [10, 11, 12, 13, 14]],
                 [[5,  6,  7,  8,  9],
                 [15, 16, 17, 18, 19]]])
boolean_idc = arr > 10
boolean_idc
array([[[False, False, False, False, False],
        [False,  True,  True,  True,  True]],

       [[False, False, False, False, False],
        [ True,  True,  True,  True,  True]]])
arr[boolean_idc]
array([11, 12, 13, 14, 15, 16, 17, 18, 19])

Summary

We learned that indexing is useful to retrieve values and reassign parts of an array. There are several ways to index. First, we can use single integers to get to an element of a certain dimension. We can also use slices with the colon syntax start:stop:step to get at a sequence of elements. Furthermore, there are two advances indexing techniques, where we can use arrays containing integers or booleans to find an arbitrary collection of elements.