Filtering Data with SciPy

Time series data may contain signals at many different frequencies. Sharp increases or decreases have a high frequency. Slow increases or decreases have a low frequency. Filtering allows us to take different frequency components out of the data.

Signal filtering is a science on its own and I’ll focus on the practical aspects here and stick to two filter types: butterworth and Chebyshev type I. Each of those filters can be used for different purposes. We can use them as low pass, high pass, band pass or notch filters. Low pass filters leave low frequencies alone but attack high frequencies. High pass filters leave high frequencies alone but attach low frequencies. The title image shows an example of low and high pass filters used on the same data. Band pass filters leave a specific frequency band alone and attack all other frequencies. Notch filters attack a specific frequency band, leaving the rest alone. Let’s look at an example.

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.signal import butter, cheby1, filtfilt

data = np.load("example_data.npy")

order = 3
Wn = 4000  # in Hz
btype = 'lowpass'
fs = 50000  # in Hz

b, a = butter(order, Wn, btype, fs = fs)
data_butter = filtfilt(b, a, data)

This is a butterworth lowpass filter with a cutoff frequency of 4000Hz (Wn). That means, signals below 4000Hz are is the pass band. They are largely left alone. Signals above 4000Hz are in the stop band, they are diminished. fs is the sampling frequency of the data. If the units are Hz, it tells us how many data points are recorded during one second. filtfilt is the function that does the actual filtering on the data, based on the filter (b, a) that was designed previously. Filtering is not a perfect process. Filters have what is called roll-off at the critical 4000Hz frequency.

Ideally, we would like a filter response that falls down straight. Anything in the pass band is untouched, anything in the stop band is shutdown the same way. As you can see, our actual filter does no live up to the ideal. It already slightly attenuates signal that is part of the pass band and it falls much slower in the stop band. If we need a steeper roll off, we can increase the order of our filter.

Some filter types have steeper roll off than others. For example, the Chebyshev type I filter achieves steeper roll off by tolerating some ripple in the pass band.

This can lead to distortions in the data depending on the size of the ripple. The Chebyshev type I filter takes an argument rp that defines the amount of pass band ripple that is tolerated in units of dB. The more ripple we tolerate, the steeper the roll off will be. Here you can see how large ripple causes oscillations in the data.

Generally, the butterworth filter is sufficient for most situations and is safer because it does not ripple. I hope this post helped you filtering your own data. If you want to learn more, check out the SciPy signal docs. Both the butter and cheby1 filter are there with many, many more.

Threshold Detection in NumPy

Many signals are easily detected by their size. We will learn how to detect the indices where signals cross a threshold with NumPy. These are our practice signals.

import numpy as np
import matplotlib.pyplot as plt
data = np.array([0, 0, 0, 5, 5, 5, 5, 0, 0, 0, 0, 4, 4, 4, 0, 0, 0])
plt.plot(data, marker='o')

We will perform two simple steps to detect the threshold crossings: 1. Make the data binary, in a way that they are true when larger than the threshold and false when lower or equal. 2. Take the difference of the binary signal. This gives us a boolean array that is true when the threshold was crossed. We can combine those steps into one line.

threshold = 2
threshold_crossings = np.diff(data > threshold, prepend=False)

Plotting shows us that threshold_crossings is true after the threshold was crossed.

plt.plot(data2, marker='o')
plt.plot(thr_crossings, marker='o')
plt.legend(("Data", "Threshold Crossings"))

To get the indices of the threshold crossings we can use np.argwhere(), which returns the true indices from a boolean array.

np.argwhere(threshold_crossings)[:,0]
# array([ 3,  7, 11, 14], dtype=int64)

Threshold crossings occur at 3, 7, 11 and 14. Sometimes we only need the upward or downward crossings. We can simply isolate those by slicing the indiced array.

np.argwhere(threshold_crossings)[::2,0]  # Upward crossings
# array([ 3, 11], dtype=int64)
np.argwhere(thr_crossings)[1::2,0]  # Downward crossings
# array([ 7, 14], dtype=int64)

Sometimes we want to find the point before the threshold is crossed, rather than after. There is one simple trick in np.diff instead of setting prepend=False, we set append=False.

threshold = 2
post_threshold_crossings = np.diff(data > threshold, prepend=False)
pre_threshold_crossings = np.diff(data > threshold, append=False)
plt.plot(data2, marker='o')
plt.plot(post_threshold_crossings, marker ='o')
plt.plot(pre_threshold_crossings, marker='o')
plt.legend(("Data", "Post Crossings", "Pre Crossings"))

Make sure to check out the documentation for np.diff and bonus points if you can figure out why exactly this works. Detecting threshold crossings is an easy but important part in most of my analysis pipelines and now you can do it too.