Contrast Adjustment with scikit-image

Contrast is one of the most important properties of an image and contrast adjustment is one of the easiest things we can do to make our images look better. There are many ways to adjust the contrast and with most of them we have to be careful because they can artificially change our images. The title image shows three basic methods of contrast adjustment and how they affect the image histogram. We will cover all three methods here but first let us consider what it means for an image to have low contrast.

Each pixel in an image has an intensity value. In an 8-bit image the values can be between 0 and 255. Contrast issues occur, when the largest intensity value in our image is smaller than 255 or the smallest value is larger than 0. The reason is that we are not using the entire range of possible values, making the image overall darker than it could be. So what about the contrast of our example image? We can use .min() and .max() to find maximum and minimum intensity.

import numpy as np
import matplotlib.pyplot as plt
from skimage import io, exposure, data

image = io.imread("example_image.tif")
image.max()
# 52
image.min()
# 1

Our maximum of 52 is very far away from 255, which explains why our image is so dark. On the minimum we are almost perfect. To correct the contrast we can user the exposure module which gives us the function rescale_intensity.

image_minmax_scaled = exposure.rescale_intensity(image)
image_minmax_scaled.max()
# 255
image_minmax_scaled.min()
# 0

Now both the minimum and the maximum are optimized. All pixels that were equal to the original minimum are now 0 and all pixels equal to the maximum are now 255. But what happens to the values in the middle? Let’s look at a smaller example.

arr = np.array([2, 40, 100, 205, 250], dtype=np.uint8)

arr_rescaled = exposure.rescale_intensity(arr)
# array([  0,  39, 100, 208, 255], dtype=uint8)

As expected, the minimum 2 became 0 and the maximum 250 became 255. In the middle, 40 became smaller, nothing happened to 100 and 205 became larger. We will look at each step to find out how we got there.

arr = np.array([2, 40, 100, 205, 250], dtype=np.uint8)
min, max = arr.min(), arr.max()
arr_subtracted = arr - min  # Subtract the minimum
# array([  0,  38,  98, 203, 248], dtype=uint8)
arr_divided = arr_subtracted / (max - min)  # Divide by new max
# array([0.        , 0.15322581, 0.39516129, 0.81854839, 1.        ])
arr_multiplied = arr_divided * 255  # Multiply by dtype max
# array([  0.        ,  39.07258065, 100.76612903, 208.72983871,
#        255.        ])
# Convert dtype to original uint8
arr_rescaled = np.asarray(arr_multiplied, dtype=arr.dtype)
# array([  0,  39, 100, 208, 255], dtype=uint8)

We can get there in four simple steps. Subtract the minimum, divide by the maximum of the new subtracted array, multiply by the maximum value of the data type and finally convert back to the original data type. This works well if we want to rescale by minimum and maximum of the image but sometimes we need to use different values. Especially the maximum can be easily dominated by noise. For all we know, it could be just one pixel that is not necessarily representative for the entire image. As you can see from the histogram in the title image, very few pixels are near the maximum. This means we can use percentile rescaling with little information loss.

percentiles = np.percentile(image, (0.5, 99.5))
# array([ 1., 28.])
scaled = exposure.rescale_intensity(image,
                                    in_range=tuple(percentiles))

This time we scale from 1 to 28, which means that all values above or equal to 28 become the new maximum 255. As we chose the 99.5 percentile, this affects roughly 5% of the upper pixels. You can see the consequence in the image on the right. The image becomes brighter but it also means that we lose information. Pixels that were distinguishable now look exactly the same, because they are all 255. It is up to you if you can afford to lose those features. If you do quantitative image analysis you should perform rescaling with caution and always look out for information loss.

Plotting 2D Vectors with Matplotlib

Vectors are extremely important in linear algebra and beyond. One of the most common visual representations of a vector is the arrow. Here we will learn how to plot vectors with Matplotlib. The title image shows two vectors and their sum. As a first step we will plot the vectors originating at 0, shown below.

import matplotlib.pyplot as plt
import numpy as np

vectors = np.array(([2, 0], [3, 2]))
vector_addition = vectors[0] + vectors[1]
vectors = np.append(vectors, vector_addition[None,:], axis=0)

tail = [0, 0]
fig, ax = plt.subplots(1)
ax.quiver(*tail,
           vectors[:, 0],
           vectors[:, 1],
           scale=1,
           scale_units='xy',
           angles = 'xy',
           color=['g', 'r', 'k'])

ax.set_xlim((-1, vectors[:,0].max()+1))
ax.set_ylim((-1, vectors[:,1].max()+1))

We have two vectors stored in our vectors array. Those are [2, 0] and [3, 2]. Both in order of [x, y] as you can see from the image. We can perform vector addition between the two by simply adding vectors[0] + vectors[1]. Then we use np.append so we have all three vectors in the same array. Now we define the origin in tail, because we will want the tail of the arrow to be located at [0, 0]. Then we create the figure and axes to plot in with plt.subplots(). The plotting itself can be done with one call to the ax.quiver method. But it is quite the call, with a lot of parameters so let’s go through it.

First, we need to define the origin, so we pass *tail. Why the asterisk? ax.quiver really takes two parameters for the origin, X and Y. The asterisk causes [0, 0] to be unpacked into those two parameters. Next, we pass the x coordinates (vectors[:, 0]) and then the y coordinates (vectors[:, 1]) of our vectors. The next three parameters scale, scale_units and angles are necessary to make the arrow length match the actual numbers. By default, the arrows are scaled, based on the average of all plotted vectors. We get rid of that kind of scaling. Try removing some of those to get a better idea of what I mean. Finally, we pass three colors, one for each arrow.

So what do we need to plot the head to tail aligned vectors as in the title image? We just need to pass the vectors where the origin is the other vector.

ax.quiver(vectors[1::-1,0],
          vectors[1::-1,1],
          vectors[:2,0],
          vectors[:2,1],
          scale=1,
          scale_units='xy',
          angles = 'xy',
          color=['g', 'r'])

This is simple because it is the same quiver method but it is complicated because of the indexing syntax. Now, we no longer unpack *tail. Instead we pass x and y origins separately. In vectors[1::-1,0] the 0 gets the x coordinates. -1 inverts the array. If we would not invert, each vector would be it’s own origin. The 1 skips the first vector, which is the summed vector because we inverted. vectors[1::-1,1] gives us the y coordiantes. Finally we just need to skip the summed vector when we pass x and y magnitudes. The rest is the same.

So that’s it. Unfortunately, ax.quiver only works for 2D vectors. It also isn’t specifically made to present vectors that have a common origin. Its main use case is to plot vector fields. This is why some of the plotting here feels clunky. There is also ax.arrow which is more straightforward but only creates one arrow per method call. I hope this post was helpful for you. Let me know if you have other ways to plot vectors.

Filtering Data with SciPy

Time series data may contain signals at many different frequencies. Sharp increases or decreases have a high frequency. Slow increases or decreases have a low frequency. Filtering allows us to take different frequency components out of the data.

Signal filtering is a science on its own and I’ll focus on the practical aspects here and stick to two filter types: butterworth and Chebyshev type I. Each of those filters can be used for different purposes. We can use them as low pass, high pass, band pass or notch filters. Low pass filters leave low frequencies alone but attack high frequencies. High pass filters leave high frequencies alone but attach low frequencies. The title image shows an example of low and high pass filters used on the same data. Band pass filters leave a specific frequency band alone and attack all other frequencies. Notch filters attack a specific frequency band, leaving the rest alone. Let’s look at an example.

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.signal import butter, cheby1, filtfilt

data = np.load("example_data.npy")

order = 3
Wn = 4000  # in Hz
btype = 'lowpass'
fs = 50000  # in Hz

b, a = butter(order, Wn, btype, fs = fs)
data_butter = filtfilt(b, a, data)

This is a butterworth lowpass filter with a cutoff frequency of 4000Hz (Wn). That means, signals below 4000Hz are is the pass band. They are largely left alone. Signals above 4000Hz are in the stop band, they are diminished. fs is the sampling frequency of the data. If the units are Hz, it tells us how many data points are recorded during one second. filtfilt is the function that does the actual filtering on the data, based on the filter (b, a) that was designed previously. Filtering is not a perfect process. Filters have what is called roll-off at the critical 4000Hz frequency.

Ideally, we would like a filter response that falls down straight. Anything in the pass band is untouched, anything in the stop band is shutdown the same way. As you can see, our actual filter does no live up to the ideal. It already slightly attenuates signal that is part of the pass band and it falls much slower in the stop band. If we need a steeper roll off, we can increase the order of our filter.

Some filter types have steeper roll off than others. For example, the Chebyshev type I filter achieves steeper roll off by tolerating some ripple in the pass band.

This can lead to distortions in the data depending on the size of the ripple. The Chebyshev type I filter takes an argument rp that defines the amount of pass band ripple that is tolerated in units of dB. The more ripple we tolerate, the steeper the roll off will be. Here you can see how large ripple causes oscillations in the data.

Generally, the butterworth filter is sufficient for most situations and is safer because it does not ripple. I hope this post helped you filtering your own data. If you want to learn more, check out the SciPy signal docs. Both the butter and cheby1 filter are there with many, many more.

The Scientists Guide to Python

  • Install Python as part of a data science platform such as Anaconda
  • Use an integrated development environment like Spyder (Installs with Anaconda)
  • Don’t reinvent the wheel, find out what already exists in your field

Python and Science

Python has been adopted with open arms by many scientific fields. A lot of its success is directly linked to the success of NumPy (the Python package for numerical computing) and the scientific Python community (SciPy community). If you want to understand the history of this success story, I recommend reading the recent nature methods paper which gives some strong background about the SciPy community. However, you don’t need to read the paper to use Python or any of the scientific packages.

A downside of Python is that it is a general purpose programming language. Python is used for anything you can think of. Web development, GUI design, game development, data mining, machine learning, computervision and so on. This can be intimidating to beginners. Especially to scientists who have a very specific task they want to automate. Compare this situation to MATLAB. Its purpose is very specific and is even part of the name. It is the MATrix LABoratory. It deals very efficiently with matrices and is generally good at math stuff. Both very valuable and important to most scientists. MATLAB is specifically designed to serve the science and engineering communities. Python on the other hand is for everyone, which can be a weakness in the beginning but can quickly turn into its greatest strength. There are no privileged tasks in Python, they are all handled well (if there is a great community behind the task maintaining it). The purpose of this guide is to ease some of the hurdles scientists face when diving into Python and highlight the many advantages.

The first problem we face is called package management. When we download pure Python from the official website (python.org), we get the core Python functionality. Python in itself does not include most of the features we depend on as scientists. Pythons functionality is organized in so called packages that we need to install if they are not built into Python. Alternatively we can install a pre-made Python distribution that includes scientific packages. This is my preferred way to installing Python and I strongly recommend beginners to start with Anaconda.

Installing Python with Anaconda

Anaconda is a data science platform that installs Python with most of the packages we need as scientists. It is open-source and completely free. When we install Anaconda we get lots of stuff. Most importantly we get Python, NumPy, Matplotlib, Conda and Spyder. We could install those things ourselves but this way we can have them without ever worrying about package management. We already talked about core Python and NumPy. Matplotlib is a package that allows us to plot our data. Conda is a package manager that allows us to update installed packages and install new packages. Finally, Spyder is an integrated development environment (IDE). Spyder stands for Scientific PYthon Development EnviRonment and it is exactly what we need to get started.

The Anaconda navigator. A graphical user interface that helps us find the main components of Anaconda. One of those is the Spyder IDE which you can start by simply clicking launch.

An IDE is extremely useful. For example, it helps us by highlighting important parts of our code visually. By running Spyder you have just setup your own IDE. This is your kingdom now. You will be able to do amazing things here and you should celebrate this moment properly. However, as you celebrate, my duty will be to guide you through the most important parts of this graphical user interface.

Spyder (Scientific PYthon Development EnviRonment). On the left we have the editor. Here we write scripts. Scripts are collections of code that can be execute all together in series when we hit the run button (green arrow). In the lower right is an IPython console. This is where scripts are executed interactively when we run them. We can also type code here and execute it immediately

Lets start with the interactive console in the lower right. Here, IPython is running and it is awaiting your commands. You can type code and run it by hitting enter. You can use it like a fancy calculator. Try 2+2, 2-2, 2*2, 2/2, 2**2. It’s all there. The interactive console is the perfect place to try out commands and see what they do. We can define variables here and import packages. Luckily we installed Anaconda, so we have NumPy already available. The conventional way to import NumPy is import numpy as np. To learn all about NumPy, find my NumPy blog series.

On the left is the editor. This is our code notebook. Here we edit files that store our code. Those files are commonly called scripts. When you hit enter here, code is not immediately executed. You are just moving to a new line. To execute the code here we can hit the green arrow (play button) above. When we do so, all lines of code in the script are executed from top to bottom. When the script finishes, the objects created in the script still exist in the interactive console. This is very useful to debug the script. We can use the interactive console to take a close look at what the script did.

The Power of Scripts

Most of your work will be about writing scripts that do something useful. Many times, there are other ways to solve the same task. Many things can be done by hand, clicking through the graphical user interface of another program. Sometimes this is still the best way, but there are many advantages to writing a script.

A script can be much faster than a human, especially on repeat tasks. When we have to rename 10 files in a certain way, we might decide that it is not worth to write a script. Let’s just do it quickly by hand. But the same task could come back later. This time with 100 files that require renaming. The most important thing is to pick your battles. We cannot automate everything. We will have to make tough decisions.

Another advantage of scripts is that they are a protocol of the analysis. If we can write a script that takes care of everything and is capable of analyzing the entire dataset, we know exactly how the analysis was performed. When we analyze manually we can achieve the same by very carefully writing down everything we do on each data point. However, this can easily fail if something about the analysis becomes important afterwards and it is not part of the protocol. Scripts are also more easily shared than a protocol because the script language (in our case Python) is less ambiguous.

Sometimes even the script fails as a definitive protocol. This is the case when we use packages that change functions. Lets assume we use a function called fastAnalysis from a fictional package called fancyCalc. The way fastAnalysis works changed between version 1.1 and 1.2 slightly. We only know what our script did if we remember which version we used at the time we ran the script. Here another advantage of scripts shows. We can pretty quickly run the same analysis again with different fancyCalc versions. If we get the same result as before this version is probably the one we ran previously. Manual analysis is usually much harder to repeat. Imagine you spent the last 2 month analyzing 200 samples. You had to normalize each sample for its corresponding baseline. How easy would it be to do the same analysis by hand with a slightly larger baseline interval? In a script this would usually involve changing a single number. By hand it could easily lead to another 2 months of work.

The Scripting Workflow

We talked at length about the advantages of scripts but how do we actually write one? For starters we need to know what we want to do. Then we need to look online for a function that we hope can achieve our goal. Then we try out the function. If it does what we want we move on to the next task. Usually we have one pretty large, intimidating task, such as “analyze this whole data set”. Then this task falls into smaller tasks. For example we will have to load our data, do preprocessing, extract regions of interest, quantify and so on. Those tasks fall apart into even smaller tasks. For preprocessing one of those tasks might be to subtract the background. Ideally, if we make our tasks small enough we will be less intimidated and we will find a function that performs this task.

In the age of the internet, programming is easy enough. Unfortunately, to become effective at it we will need to do some learning and get practice. Our goal is to become effective scripters. So we will need to learn three things: 1. We need to learn some basic Python. This will be easy, I promise. 2. We need to learn how to think computationally about our tasks. Only then will be be able to effectively divide entire projects into smaller, manageable tasks. 3. We will learn how to find, evaluate and use task specific packages quickly. Most tasks have already been solved by other people and we never want to reinvent the wheel (unless we have a vision for a really cool wheel that is much better than all the other wheels that already exist). My blog will lead you through all three steps.

What’s next?

The best thing you can do next is to start coding. Today. The amount of different programming languages, distributions of those languages, integrated development environments and resources to learn all of those is immense and can lead to real decision paralysis. You can spend forever trying to decide how to start but you already have everything you need. Install Anaconda, launch Spyder and just let lose. If you are not sure if Anaconda is right for you, install a distribution you think is more suitable. Install pure Python if you are feeling adventurous and just start hacking away. If you are not convinced Python is the best language for you, install another language. If you don’t want to learn with this blog, use another resource. There are so many great ones. Just do it. So now you need to get your hands dirty.

In future blog posts I will go through the three points I think are necessary to become effective scripters. I will start with basic Python and I will collect those blog posts here. I’d be happy if you want to join me.

Doing the Math with Python

First, we take a look at basic math with Python. We learn the basic arithmetic operators, parentheses and how to save the results by assigning a name.