Structural Causal Models to Clarify Causality in Neuroscience

Neuroscience is full of causal questions. Is this brain area necessary for a behavior? Does this drug decrease seizure frequency in epileptic patients? Does the firing rate of a neuron influence the firing rate of that other neuron? But while there are many different causal questions there are also slightly different notions of causality. Barack et al. 2022 rightly call for more clarity when asking and answering causal questions. Here I want to makea point that will hopefully help with adding clarity to causal reasoning in neuroscience. My point specifically relates to the power of structural causal models (SCMs). (Judea Pearl has taken note of the paper and that SCMs are not mentioned. Find his tweet here. I recommend anyone to read The Book of Why.)

Specifically I want to make two main points:

  • A structural causal model (SCM) is extremely useful and adds clarity to causal reasoning.
  • I agree that multiple concepts of causality will be useful in neuroscience but all of them become much clearer when explained with a SCM.

I will start by defining what a SCM is and then I will try to make some neuroscience questions more clear.

The Structural Causal Model

A structural causal model (SCM) consists of three sets. The set U contains the error terms that are outside (exogenous) to the model. The set V contains the variables inside (endogenous) to our model and we are interested in the causal relations between them. Finally, the set E contains a functions that describe each variable in V in terms of other variables in V or U. Thereby, E describes the causal relationships in the SCM. The mathematical description is neat but SCMs also have a very clear graphical representation, where each variable V is a circle and arrows between circles show causal relationships. For example, the SCM below proposes causal influences on neuronal activity during optogenetic perturbation.

Fig 1.

This is extremely useful. It tells us that the neuronal activity is determined by multiple variables and an optogenetic construct we might introduce becomes one of those factors. A small disclaimer: you might be missing the errors terms U. If each variable in V is associated with exactly one error term they are by convention omitted in the graphical model. So gaining optogenetic control over neuronal activity would be hard. Cutting all other arrows would probably be impossible. But we could use the non-linearity of the system and try to find optogenetic stimulation strengths where other variables become negligible. But I don’t want to stay with this model for too long because it’s just for illustrative purposes. Instead I want to talk about the different definitions of causality.

The Path Definition of Causality

The causal definition I usually work with goes like this: if there is a directed path from x to y, x has a causal influence on y. Also: if there is no directed path from x to y, x does not have a causal influence on y. Below are some SCMs where in the upper three (A, B, C) x has a causal influence on y, whereas in the lower three (D, E, F), x does not have a causal influence on y.

Fig 2.

In A, there is a directed path but z also acts as a confounder. In B, z is a mediator. In C there are two causal pathways, a direct one and one mediated by z. In D, there is a path but it is not directed (it collides on z). In E, there is a path but it is not directed (this is the SCM that gives you correlations between ice cream sales and violent crime). In F, the path is in the wrong direction. So we can use SCMs to clarify any kind of causal relation. One thing the graphical representation of SCMs does not tell us what the exact form of the causal relationship is. This is not necessarily a bad thing because it means our definition of causality does not depend on linearity. But sometimes we need to know whether y is continuous (neuronal rate) or nominal (mouse going left or right or staying). That is usually easy to clarify in writing. A bigger issue is that causal inference in SCMs works best only if there are no cycles between variables. Cycles however, are pretty normal in neuroscience (I work on the hippocampus where a lot of information flows in a loop).

So what to do about cycles? This is where time comes in. We can define the SCM at a timescale where the cycle is negligible. For example, in Fig 1 I have the variable “Previous Activity” (for an interesting usage of previous activity as instrumental variable see Lepperød et al. 2022). If that doesn’t work for you because you are interested in a time scale where the previous activity is still being influenced by the current activity (a truly unbreakable cycle), then there is probably no way around actually simulating the dynamical system with respect to time.

In summary, I believe that any concept of causality becomes more clear when we draw a SCM. If you can think of a concept that does not work in SCMs let me know. Another concept that does not come up in Barack et al. 2022 is do-calculus which also becomes very clear when shown with a SCM.

do-calculus, interventions and counterfactuals would be interesting to write about at some point but for now I’m out of time.

Balanced spiking neural networks with NumPy

Balanced spiking neural networks are a cornerstone of computational neuroscience. They make for a nice introduction into spiking neuronal networks and they can be trained to store information (Nicola & Clopath, 2017). Here I will present a Python port I made from a MATLAB implementation by Nicola & Clopath and I will go through some of its features. We only need NumPy.

import numpy as np
from numpy.random import rand, randn
import matplotlib.pyplot as plt


def balanced_spiking_network(dt=0.00005, T=2.0, tref=0.002, tm=0.01,
                             vreset=-65.0, vpeak=-40.0, n=2000, 
                             td=0.02, tr=0.002, p=0.1, 
                             offset=-40.00, g=0.04, seed=100, 
                             nrec=10):
    """Simulate a balanced spiking neuronal network

    Parameters
    ----------
    dt : float
        Sampling interval of the simulation.
    T : float
        Duration of the simulation.
    tref : float
        Refractory time of the neurons.
    tm : float
        Time constant of the neurons.
    vreset : float
        The voltage neurons are set to after a spike.
    vpeak : float
        The voltage above which a spike is triggered
    n : int
        The number of neurons.
    td : float
        Synaptic decay time constant.
    tr : float
        Synaptic rise time constant.
    p : float
        Connection probability between neurons.
    offset : float
        A constant input into all neurons.
    g : float
        Scaling factor of synaptic strength
    seed : int
        The seed makes NumPy random number generator deterministic.
    nrec : int
        The number of neurons to record.

    Returns
    -------
    ndarray
        A 2D array of recorded voltages. Rows are time points,
        columns are the recorded neurons. Shape: (int(T/dt), nrec).
    """

    np.random.seed(seed)  # Seeding randomness for reproducibility

    """Setup weight matrix"""
    w = g * (randn(n, n)) * (rand(n, n) < p) / (np.sqrt(n) * p)
    # Set the row mean to zero
    row_means = np.mean(w, axis=1, where=np.abs(w) > 0)[:, None]
    row_means = np.repeat(row_means, w.shape[0], axis=1)
    w[np.abs(w) > 0] = w[np.abs(w) > 0] - row_means[np.abs(w) > 0]

    """Preinitialize recording"""
    nt = round(T/dt)  # Number of time steps
    rec = np.zeros((nt, nrec))

    """Initial conditions"""
    ipsc = np.zeros(n)  # Post synaptic current storage variable
    hm = np.zeros(n)  # Storage variable for filtered firing rates
    tlast = np.zeros((n))  # Used to set  the refractory times
    v = vreset + rand(n)*(30-vreset)  # Initialize neuron voltage

    """Start integration loop"""
    for i in np.arange(0, nt, 1):
        inp = ipsc + offset  # Total input current

        # Voltage equation with refractory period
        # Only change if voltage outside of refractory time period
        dv = (dt * i > tlast + tref) * (-v + inp) / tm
        v = v + dt*dv

        index = np.argwhere(v >= vpeak)[:, 0]  # Spiked neurons

        # Get the weight matrix column sum of spikers
        if len(index) > 0:
            # Compute the increase in current due to spiking
            jd = w[:, index].sum(axis=1)

        else:
            jd = 0*ipsc

        # Used to set the refractory period of LIF neurons
        tlast = (tlast + (dt * i - tlast) *
                 np.array(v >= vpeak, dtype=int))

        ipsc = ipsc * np.exp(-dt / tr) + hm * dt

        # Integrate the current
        hm = (hm * np.exp(-dt / td) + jd *
              (int(len(index) > 0)) / (tr * td))

        v = v + (30 - v) * (v >= vpeak)

        rec[i, :] = v[0:nrec]  # Record a random voltage
        v = v + (vreset - v) * (v >= vpeak)

    return rec


if __name__ == '__main__':
    rec = balanced_spiking_network()
    """PLOTTING"""
    fig, ax = plt.subplots(1)
    ax.plot(rec[:, 0] - 100.0)
    ax.plot(rec[:, 1])
    ax.plot(rec[:, 2] + 100.0)
Three neurons in the balanced spiking neural network.

The weight matrix

At the core of any balanced network is the weight matrix. We define it on line 54 to 58. Initializing it from a normal distribution and normalizing the row mean makes sure that excitation and inhibition are in balance. That is what keeps the network spiking irregularly although the input to the network remain constant. The constant input to the network is the offset parameter.

Refractory period

The refractory period is a time window where no action potential can be generated. We achieve this by setting the voltage to a low value right after the spike and then we do not update the voltage of the spike for a given time. This time window is given be tref. We update the voltage on line 76. In the same line we check how long ago the last spike occurred with the expression (dt * i > tlast + tref). Therefore, we need to track the most recent spike time with tlast. Of course we have some other things to do when a neuron reaches the spiking threshold vpeak. First we set the voltage to a value well above the threshold on line 99. This is purely visual to give a spiky appearance in the recording. So right after we recorded on line 101 we set the voltage to its reset value vreset.

Play around with some of the parameters. You can find the code here: https://gist.github.com/danielmk/9adc7409f40a076ffec0cdf85dea4519

Spiking neural networks for a low-energy future

Spiking neural networks (SNNs) have some disadvantages compared to artificial neural networks (ANNs) but they have the potential to run for a fraction of the energy. Whether SNNs will be able to replace ANNs and how much energy they will be using depends on many engineering and neuroscience advances. Here I will go through some of the technical background of the SNN energy advantages and some of the current numbers.

Energy efficient SNN features

The energy efficiency of SNNs comes primarily from two features. Firstly, the spike is a discrete event and energy is only used when a spike occurs. This is probably the most fundamental feature that distinguishes SNNs from ANNs. This means that the energy efficiency of a SNN depends not only on the number of neurons but also on the number of spikes the model requires to perform. The second feature is local memory. At the heart of all models are parameters. On traditional hardware such as CPUs and GPUs, the part of the chip that performs the calculations is not the same that remembers the parameters. Loading the parameters onto the chip is much more energy intensive than the computation itself. Therefore, when the parameters can be stored locally on the chip that computes, efficiency advantages result. This is not something unique to spiking neural networks. Some tensor processing units (TPUs) also feature local memory and they are specifically designed for ANNs. When most people speak about the energy advantages of SNNs, they assume local memory.

SNNs also require specialized hardware to run efficiently. That hardware is called neuromorphic. It makes efficient use of the binary nature of spikes and local memory. Neuromorphic hardware is so far only available for research purpose and making it more widely available will be one of the challenges to SNN adoption. Next will be some numbers on energy efficiency.

How efficient are we talking?

How much more efficient SNNs are depends on many factors of the comparison. What is the task, what are the model architectures and what is the hardware. Making projections into the future is even harder, since machine learning advances are made quickly on both SNNs and ANNs. Projecting the absolute amount of energy that could be saved is then even harder because it requires AI demand predictions which can change non-linearly with technical advances. I would be interested in finding formal work on some of these uncertainties or work on some myself but for now here are some numbers.

The Loihi processor from Intel Labs is a recent piece of neuromorphic hardware. Depending on the size of their example problem they find that Loihi is 2.58x, 8.08x or 48.74x more energy efficient than a 1.67-GHz Atom CPU (Davies et al. 2018).

Yin et al. (2020) present a method to train SNNs (backpropagation of surrogate gradients). They calculate the theoretical energy consumption for a spiking recurrent network they train with the method and some ANN architectures. Depending on the task, their SNN was 126.2x, 935x, 1602x, 1776x or 3353.3x more efficient than a Long Short-Term Memory network (LSTM; also depends on some details of the LSTM implementation). Their network was 41.3x more efficient compared to a recurrent ANN. Here is a talk from the last author Sander Bohte where he summarizes the findings as >100x more efficient than best recurrent ANN and 1000x more efficient than LSTM. All their calculations assume local memory.

Panda et al. (2012) tried several methods to generate SNNs for image classification and calculated theoretical energy consumption. They estimate better efficiencies of SNNs of 6.52x, 7.7x, 10.6x, 74.9x, 81.3x, 104.8x depending on model architecture and parameter space.

Merolla et al. (2014) present the TrueNorth neuromorphic architecture. They compare synaptic operations per second (SOPS) of their architecture to floating-point operations per second (FLOPS) of traditional chips. They say that TrueNorth can deliver 46 billion SOPS per watt. The most energy-efficient supercomputer they say (at time of their writing) generates 4.5 billion FLOPS per watt.

These numbers highlight the potential for some massive energy savings but benchmarks are always complicated. Making good comparisons can be hard, especially since the unit of computational efficiency is fundamentally different. Either way, SNNs on neuromorphic hardware are extremely energy efficient but to truly save energy they must become better at the tasks ANNs already solve.

References

Davies et al. 2018. Loihi: A Neuromorphic Manycore Processor with On-Chip Learning. IEEE Micro. 10.1109/MM.2018.112130359

Yin, Corradi & Bohte 2020. Effective and Efficient Computation with Multiple-timescale Spiking Recurrent Neural Networks. https://arxiv.org/abs/2005.11633

Panda, Aketi & Roy, 2012. Towards Scalable, Efficient and Accurate Deep Spiking Neural Networks with Backward Residual Connections, Stochastic Softmax and Hybridization. https://arxiv.org/abs/1910.13931.

Merolla et al. (2014). A million spiking-neuron integrated circuit with a scalable communication network and interface. https://science.sciencemag.org/content/345/6197/668

Spiking Neuronal Networks in Python

Spiking neural networks (SNNs) turn some input into an output much like artificial neural networks (ANNs), which are already widely used today. Both achieve the same goal in different ways. The units of an ANN are single floating-point numbers that represent the activity levels of the units for a given input. Neuroscientists loosely understand this number as the average spike rate of a unit. In ANNs this number is usually the result of multiplying the input with a weight matrix. SNNs work differently in that they simulate units as spiking point processes. How often and when a unit spikes depends on the input and the connections between neurons. A spike causes a discontinuity in other connected units. SNNs are not yet widely used outside of laboratories but they are important to help neuroscientists model brain circuitry. Here we will create spiking point models and connect them. We will be using Brian2, a Python package to simulate SNNs. You can install it with either

conda install -c conda-forge brian2
or
pip install brian2

We will start by defining our spiking unit model. There are many different models of spiking. Here we will define a conductance based version of the leaky integrate-and-fire model.

from brian2 import *
import numpy as np
import matplotlib.pyplot as plt

start_scope()

# Neuronal Parameters
c = 100*pF
vl = -70*mV
gl = 5*nS

# Synaptic Parameters
ge_tau = 20*ms
ve = 0*mV
gi_tau = 100*ms
vi = -80*mV
w_ge = 1.0*nS
w_gi = 0.0*nS

lif = '''
dv/dt = -(gl * (v - vl) + ge * (v - ve) + gi *(ve - vi) - I)/c : volt
dge/dt = -ge / ge_tau : siemens
dgi/dt = -gi / gi_tau : siemens
I : amp
'''

The intended way to import Brian2 is from brian2 import *. This feels a bit dangerous but it is very important to keep the code readable, especially when dealing with physical units. Speaking of units, each of our parameters has a physical unit. We have picofarad (pF), millivolt (mV) and nanosiemens (nS). These are all imported from Brian2 and we assign units with the multiplication operator *. To find out what each of those parameters does, we can look at our actual model. The model is defined by the string we assign to lif. The first line of the string is the model of our spiking unit:

dv/dt = -(gl * (v - vl) + ge * (v - ve) + gi *(v - vi) - I)/c : volt

This is a differential equation that describes the change of voltage with respect to time. In electrical terms, voltage is changed by currents. These physical terms are not strictly necessary for the computational function of SNNs but they are helpful to understand the biological background behind them. Three currents flow in our model.

The first one is gl * (v - vl). This current is given by the conductance (gl), the current voltage (v) and the equilibrium voltage (vl). It flows whenever the voltage differs from vl. gl is just a scaling constant that determines how strong the drive towards vl is. This is why vl is called the resting potential, because v does not change when it is equal to vl. This term makes our model a leaky integrate-and-fire model as opposed to an integrate-and-fire model. Of course the voltage only remains at rest if it is not otherwise changed. There are three other currents that can do that. Two of them correspond to excitatory and inhibitory synaptic inputs. ge * (v - ve) drives the voltage towards ve. Under most circumstances, this will increase to voltage as ve equals 0mV. On the other hand gi *(ve - vi) drives the voltage towards vi, which is even slightly smaller than vl with -80mV. This current keeps the voltage low. Finally there is I, which is the input current that the model receives. We will define this later for each neuron. Finally, currents don’t change the voltage immediately. They are all slowed down by the capacitance c. Therefore, we divide the sum of all currents by c.

There are two more differential equations that describe our model:

dge/dt = -ge / ge_tau : siemens
dgi/dt = -gi / gi_tau : siemens

These describe the change of the excitatory and inhibitory synaptic conductances. We did not yet implement the way spiking changes ge or gi. However, these equations tell us that both ge and gi will decay towards zero with the time constants ge_tau and gi_tau. So far so good, but what about spiking? That comes up next, when we turn the string that represents our model into something that Brian2 can actually simulate.

G = NeuronGroup(3, lif, threshold='v > -40*mV',
                reset='v = vl', method='euler')
G.I = [0.7, 0.5, 0]*nA
G.v = [-70, -70, -70]*mV

The NeuronGroup creates for us three units that are defined by the equations in lif. The threshold parameter gives the condition to register a spike. In this case, we register a spike, when the voltage is larger than 40mV. When a spike is registered, an event is triggered, defined by reset. Once a spike is registered, we reset the voltage to the resting voltage vl. The method parameter gives the integration method to solve the differential equations.

Once our units are defined, we can interface with some parameters of the neurons. For example, G.I = [0.7, 0.5, 0]*nA sets the input current of the zeroth neuron to 0.7nA, the first neuron to 0.5nA and the last neuron to 0nA. Not all parameters are accessible like this. I is available because we defined I : amp in our lif string. This says that I is available for changes. Next, we define the initial state of our neurons. G.v = [-70, -70, -70]*mV sets all of them to the resting voltage. A good place to start.

You might be disappointed by this implementation of spiking. Where is the sodium? Where is the amplification of depolarization? The leaky integrate-and-fire model doesn’t feature a spiking mechanism, except for the discontinuity at the threshold. If you are interested in incorporating spike-like mechanisms you should look for the exponential leaky integrate-and-fire or a Hodgkin-Huxley like model.

We are only missing one more ingredient for an actual network model: the synapses. And we are in a great position, because our model already defines the excitatory and the inhibitory conductances. Now we just need to make use of them.

Se = Synapses(G, G, on_pre='ge_post += w_ge')
Se.connect(i=0, j=2)

Si = Synapses(G, G, on_pre='gi_post += w_gi')
Si.connect(i=1, j=2)

First we create an excitatory connection from our neurons onto themselves. The connection is excitatory because it increases the conductance ge of the postsynaptic neuron by w_ge. We then call Se.connect(i=0, j=2) to define which neurons actually connect. In this case, the zeroth neuron connects to the second neuron. This means, spikes in the zeroth neuron cause ge to increase in the second neuron. We then create the inhibitory connection and make the first neuron inhibit the second neuron. Now we are ready to run the actual simulation and plot the result. Remember that for now w_gi = 0.0*nS, meaning that we will only see the excitatory connection.

M = StateMonitor(G, 'v', record=True)

run(100*ms)

fig, ax = plt.subplots(1)
ax.plot(M.t/ms, M.v[0]/mV)
ax.plot(M.t/ms, M.v[1]/mV)
ax.plot(M.t/ms, M.v[2]/mV)
ax.set_xlabel('time (ms)')
ax.set_ylabel('voltage (mV)')
ax.legend(('N0', 'N1', 'N2'))
Voltage traces from three neurons. Spiking of N0 increases voltage in N2. Spiking of N1 does nothing in this simulation because its weight onto N2 was set to 0mV.

The first two neurons are regularly spiking. N0 is slightly faster because it receives a larger input current than N1. N2 on the other hand rests at -70mV because it does not receive an input current. When N0 spikes, it causes the voltage of N2 to increase as we expected, because N0 increases the excitatory conductance. N1 is not doing anything here, because its weight is set to 0. Continued activity of N0 eventually causes N2 to reach the threshold of -40mV, making it register its own spike and reset the voltage. What happens if we introduce the inhibitory weight?

w_gi = 0.5*nS
run(100*ms)

fig, ax = plt.subplots(1)
ax.plot(M.t/ms, M.v[0]/mV)
ax.plot(M.t/ms, M.v[1]/mV)
ax.plot(M.t/ms, M.v[2]/mV)
ax.set_xlabel('time (ms)')
ax.set_ylabel('voltage (mV)')
ax.legend(('N0', 'N1', 'N2'))
The same as above but now N1 has a weight of 0.5nS. This inhibits N2 and prevents a spike.

With a weight of 0.5nS, N1 inhibits N2 to an extent that prevents the spike. We have now built a network of leaky integrate-and-fire neurons that features both excitatory and inhibitory synapses. This is just the start to getting a functional network that does something interesting with the input. We will need to increase the number of neurons, decide on a connectivity rule between them, initialize or even learn weights, decide on a coding scheme for the output and much more. Many of these I will cover in later blog posts so stay tuned.

Differential Equations in Python with SciPy

Differential equations are special because they don’t tell us the value of a variable straight up. Instead, they tell us by how much the variable will change with respect to the change of another variable. Usually that other variable is time. To numerically solve a system of differential equations we need to track the systems change over time starting at an initial state. This process is called numerical integration and there is a SciPy function for it called odeint. We will learn how to use this package by simulating the ‘hello world’ of differential equations: the Lorenz system.

Here is the first part of the code where we define the function that describes the dynamics of the system.

import numpy as np
import matplotlib.pyplot as plt
from scipy.integrate import odeint
from mpl_toolkits.mplot3d import Axes3D

def lorenz(state, t, sigma, beta, rho):
    x, y, z = state
    
    dx = sigma * (y - x)
    dy = x * (rho - z) - y
    dz = x * y - beta * z
    
    return [dx, dy, dz]

We start with some imports. Of course we need NumPy and odeint is imported from scipy.integrat. Matplotlib will be used to plot the result of our simulation. After that we define the system of differential equations that defines our Lorenz system. It consists of three differential equations that we fit into one function called lorenz. This function needs a specific call signature (lorenz(state, t, sigma, beta, rho)) because we will later pass it to odeint which expects specific parameters in specific places. Most importantly, the first parameter must be the state of the system.The state of the Lorenz system is defined by three variables: x, y, z. Our state object has to be a sequence with an order that reflects this.

Inside the lorenz function, the first thing we do is to unpack the state into the three state variables. This is followed by the three differential equations that described the dynamic changes of the state variables. The fact that the variable t does not show up in any of these equations is a common point of confusion. The amount of change certainly depends on the amount of time. So why can we ignore t here? The answer is that our numerical integrator will keep track of t for us. For this particular system we could actually build a function that does not take the parameter t but I include it because it can be useful if you want to add discontinuities that depend on t.

While t does not appear in the equations, sigma, beta & rho do. They are the parameters of the system and the system’s properties depend on them. We will set those parameters next.

sigma = 10.0
beta = 8.0 / 3.0
rho = 28.0

p = (sigma, beta, rho)

These are the parameters Lorenz himself used and they are known to produce the type of dynamic that the Lorenz system is most known for: the Lorenz attractor. It is important that we store these parameters in a tuple in this exact order because of our functions structure. It must be a tuple rather than another type of collection because odeint expects it. Now that our parameters are defined, we will move on to define the initial values of the system. This is a critical part of solving differential equations. These equations tell us by how much the system state changes but they cannot tell us where to start.

y0 = [1.0, 1.0, 1.0]

Our system will start with all variables at 1.0. Now we can solve the system and plot the result.

t = np.arange(0.0, 40.0, 0.01)

result = odeint(lorenz, y0, t, p)

fig = plt.figure()
ax = fig.gca(projection="3d")
ax.plot(result[:, 0], result[:, 1], result[:, 2])
The Lorenz attractor.

We solve the system with a simple call to to odeint and we pass it the function that defines out system, the initial state, the time points t for which we want to solve the system and the parameters p. result is a two-dimensional array where the rows are the time points and the columns are the state variables at that those time points. And this is how we can solve differential equations with SciPy.

A Curve Fitting Guide for the Busy Experimentalist

Curve fitting is an extremely useful analysis tool to describe the relationship between variables or discover a trend within noisy data. Here I’ll focus on a pragmatic introduction curve fitting: how to do it in Python, why can it fail and how do we interpret the results? Finally, I will also give a brief glimpse at the larger themes behind curve fitting, such as mathematical optimization, to the extent that I think is useful for the casual curve fitter.

Curve Fitting Made Easy with SciPy

We start by creating a noisy exponential decay function. The exponential decay function has two parameters: the time constant tau and the initial value at the beginning of the curve init. We’ll evenly sample from this function and add some white noise. We then use curve_fit to fit parameters to the data.

import numpy as np
import matplotlib.pyplot as plt
import scipy.optimize

# The exponential decay function
def exp_decay(x, tau, init):
    return init*np.e**(-x/tau)

# Parameters for the exp_decay function
real_tau = 30
real_init = 250

# Sample exp_decay function and add noise
np.random.seed(100)
dt=0.1
x = np.arange(0,100,dt)
noise=np.random.normal(scale=50, size=x.shape[0])
y = exp_decay(x, real_tau, real_init)
y_noisy = y + noise

# Use scipy.optimize.curve_fit to fit parameters to noisy data
popt, pcov = scipy.optimize.curve_fit(exp_decay, x, y_noisy)
fit_tau, fit_init = popt

# Sample exp_decay with optimized parameters
y_fit = exp_decay(x, opt_tau, opt_init)

fig, ax = plt.subplots(1)
ax.scatter(x, y_noisy,
           alpha=0.8,
           color= "#1b9e77",
           label="Exponential Decay + Noise")
ax.plot(x, y,
        color="#d95f02",
        label="Exponential Decay")
ax.plot(x, y_fit,
        color="#7570b3",
        label="Fit")
ax.set_xlabel("x")
ax.set_ylabel("y")
ax.legend()
ax.set_title("Curve Fit Exponential Decay")

Our fit parameters are almost identical to the actual parameters. We get 30.60 for fit_tau and 245.03 for fit_init both very close to the real values of 30 and 250. All we had to do was call scipy.optimize.curve_fit and pass it the function we want to fit, the x data and the y data. The function we are passing should have a certain structure. The first argument must be the input data. All other arguments are the parameters to be fit. From the call signature of def exp_decay(x, tau, init) we can see that x is the input data while tau and init are the parameters to be optimized such that the difference between the function output and y_noisy is minimal. Technically this can work for any number of parameters and any kind of function. It also works when the sampling is much more sparse. Below is a fit on 20 randomly chosen data points.

Of course the accuracy will decrease with the sampling. So why would this every fail? The most common failure mode in my opinion is bad initial parameters.

Choosing Good Initial Parameters

The initial parameters of a function are the starting parameters before being optimized. The initial parameters are very important because most optimization methods don’t just look for the best fit randomly. That would take too long. Instead, it starts with the initial parameters, changes them slightly and checks if the fit improves. When changing the parameters shows very little improvement, the fit is considered done. That makes it very easy for the method to stop with bad parameters if it stops in a local minimum or a saddle point. Let’s look at an example of a bad fit. We will change our tau to a negative number, which will result in exponential growth.

In this case fitting didn’t work. For a real_tau and real_init of -30 and 20 we get a fit_tau and fit_init of 885223976.9 and 106.4, both way off. So what happened? Although we never specified the initial parameters (p0), curve_fit chooses default parameters of 1 for both fit_tau and fit_init. Starting from 1, curve_fit never finds good parameters. So what happens if we choose better parameters? Looking at our exp_decay definition and the exponential growth in our noisy data, we know for sure that our tau has to be negative. Let’s see what happens when we choose a negative initial value of -5.

p0 = [-5, 1]
popt, pcov = scipy.optimize.curve_fit(exp_decay, x, y_noisy, p0=p0)
fit_tau, fit_init = popt
y_fit = exp_decay(x, fit_tau, fit_init)
fig, ax = plt.subplots(1)
ax.scatter(x, y_noisy,
           alpha=0.8,
           color= "#1b9e77",
           label="Exponential Decay + Noise")
ax.plot(x, y,
        color="#d95f02",
        label="Exponential Decay")
ax.plot(x, y_fit,
        color="#7570b3",
        label="Fit")
ax.set_xlabel("x")
ax.set_ylabel("y")
ax.legend()
ax.set_title("Curve Fit Exponential Growth Good Initials")

With an initial parameter of -5 for tau we get good parameters of -30.4 for tau and 20.6 for init (real values were -30 and 20). The key point is that initial conditions are extremely important because they can change the result we get. This is an extreme case, where the fit works almost perfectly for some initial parameters or completely fails for others. In more subtle cases different initial conditions might result in slightly better or worse fits that could still be relevant to our research question. But what does it mean for a fit to be better or worse? In our example we can always compare it to the actual function. In more realistic settings we can only compare our fit to the noisy data.

Interpreting Fitting Results

In most research setting we don’t know our exact parameters. If we did, we would not need to do fitting at all. So to compare the goodness of different parameters we need to compare our fit to the data. How do we calculate the error between our data and the prediction of the fit? There are many different measures but among the most simple ones is the sum of squared residuals (SSR).

def ssr(y, fy):
    """Sum of squared residuals"""
    return ((y - fy) ** 2).sum()

We take the difference between our data (y) and the output of our function given a parameter set (fy). We square that difference and sum it up. In fact this is what curve_fit optimizes. Its whole purpose is to find the parameters that give the smallest value of this function, the least square. The parameters that give the smallest SSR are considered the best fit. We saw that this process can fail, depending on the function and the initial parameters, but let’s assume for a moment it worked. If we found the smallest SSR, does that mean we found the perfect fit? Unfortunately not. What we found was a good estimate for the best fitting parameters given our function. There are probably other functions out there that can fit our data better. We can use the SSR to find better fitting functions in a process called cross-validation. Instead of comparing different parameters of the same function we compare different functions. However, if we increase the number of parameters we run into a problem called overfitting. I will not get into the details of overfitting here because it is beyond our scope.

The main point is that we must stay clear of misinterpretations of best fit. We are always fitting the parameters and not the function. If our fitting works, we get a good estimate for the best fitting parameters. But sometimes our fitting doesn’t work. This is because our fitting method did not converge to the minimum SSR and in the final chapter we will find out why that might happen in our example.

The Error Landscape of Exponential Decay

To understand why fitting can fail depending on the initial conditions we should consider the landscape of our sum of squared residuals (SSR). We will calculate it by assuming that we already know the init parameter, so we keep it constant. Then we calculate the SSR for many values of tau smaller than zero and many values for tau larger than zero. Plotting the SSR against the guessed tau will hopefully show us how the SSR looks around the ideal fit.

real_tau = -30.0
real_init = 20.0

noise=np.random.normal(scale=50, size=x.shape[0])
y = exp_decay(x, real_tau, real_init)
y_noisy = y + noise
dtau = 0.1
guess_tau_n = np.arange(-60, -4.9, dtau)
guess_tau_p = np.arange(1, 60, dtau)

# The SSR function
def ssr(y, fy):
    """Sum of squared residuals"""
    return ((y - fy) ** 2).sum()

loss_arr_n = [ssr(y_noisy, exp_decay(x, tau, real_init)) 
              for tau in guess_tau_n]
loss_arr_p = [ssr(y_noisy, exp_decay(x, tau, real_init))
              for tau in guess_tau_p]

"""Plotting"""
fig, ax = plt.subplots(1,2)
ax[0].scatter(guess_tau_n, loss_arr_n)
real_tau_loss = ssr(y_noisy, exp_decay(x, real_tau, real_init))
ax[0].scatter(real_tau, real_tau_loss, s=100)
ax[0].scatter(guess_tau_n[-1], loss_arr_n[-1], s=100)
ax[0].set_yscale("log")
ax[0].set_xlabel("Guessed Tau")
ax[0].set_ylabel("SSR Standard Log Scale")
ax[0].legend(("All Points", "Real Minimum", "-5 Initial Guess"))

ax[1].scatter(guess_tau_p, loss_arr_p)
ax[1].scatter(guess_tau_p[0], loss_arr_p[0], s=100)
ax[1].set_xlabel("Guessed Tau")
ax[1].set_ylabel("SSR")
ax[1].legend(("All Points", "1 Initial Guess"))

On the left we see the SSR landscape for tau smaller than 0. Here we see that towards zero, the error becomes extremely large (note the logarithmic y scale). This is because towards zero the exponential growth becomes ever faster. As we move to more negative values we find a minimum near -30 (orange), our real tau. This is the parameter curve_fit would find if it only optimized tau and started initially at -5 (green). The optimization method does not move to more negative values from -30 because there the SSR becomes worse, it increases.

On the right side we get a picture of why optimization failed when we started at 1. There is no local minimum. The SSR just keeps decreasing with larger values of tau. That is why the tau was so larger when fitting failed (885223976.9). If we set our initial parameter anywhere in this part of the SSR landscape, this is where tau will go. Now there are other optimization methods that can overcome bad initial parameters. But few are completely immune to this issue.

Easy to Learn Hard to Master.

Curve fitting is a very useful technique and it is really easy in Python with Scipy but there are some pitfalls. First of all, be aware of the initial values. They can lead to complete fitting failure or affect results in more subtle systematic ways. We should also remind ourselves that even with decent fitting results, there might be a more suitable function out there that can fit our data even better. In this particular example we always knew what the underlying function was. This is rarely the case in real research settings. Most of the time it is much more productive to think more deeply about possible underlying functions than finding more complicated fitting methods.

Finally, we barely scratched the surface here. Mathematical optimization is an entire field in itself and it is relevant to many areas such as statistics, machine learning, deep learning and many more. I tried to give the most pragmatic introduction to the topic here. If want to go deeper into the topic I recommend this Scipy lecture and of course the official Scipy documentation for optimization and root finding.

The Hodgkin-Huxley Neuron in Julia

The Hodgkin-Huxley model is one of the earliest mathematical descriptions of neural spiking. It was originally developed on data from the squid giant axon. Today, Hodgkin-Huxley like dynamics are also used to simulate the spiking of a variety of neuron types. I’ve recently written a script to simulate Hodgkin-Huxley dynamics in Julia. Here I am sharing that code and I will go through the most important elements. As I just started to learn Julia I will also mention some of the things I learned about Julia in the process.

using Plots
gr()

# Hyperparameters
tmin = 0.0
tmax = 1000.0
dt = 0.01
T = tmin:dt:tmax

# Parameters
gK = 35.0
gNa = 40.0
gL = 0.3
Cm = 1.0
EK = -77.0
ENa = 55.0
El = -65.0

# Potassium ion-channel rate functions
alpha_n(Vm) = (0.02 * (Vm - 25.0)) / (1.0 - exp((-1.0 * (Vm - 25.0)) / 9.0))
beta_n(Vm) = (-0.002 * (Vm - 25.0)) / (1.0 - exp((Vm - 25.0) / 9.0))

# Sodium ion-channel rate functions
alpha_m(Vm) = (0.182*(Vm + 35.0)) / (1.0 - exp((-1.0 * (Vm + 35.0)) / 9.0))
beta_m(Vm) = (-0.124 * (Vm + 35.0)) / (1.0 - exp((Vm + 35.0) / 9.0))

alpha_h(Vm) = 0.25 * exp((-1.0 * (Vm + 90.0)) / 12.0)
beta_h(Vm) = (0.25 * exp((Vm + 62.0) / 6.0)) / exp((Vm + 90.0) / 12.0)

# n, m & h steady-states
n_inf(Vm=0.0) = alpha_n(Vm) / (alpha_n(Vm) + beta_n(Vm))
m_inf(Vm=0.0) = alpha_m(Vm) / (alpha_m(Vm) + beta_m(Vm))
h_inf(Vm=0.0) = alpha_h(Vm) / (alpha_h(Vm) + beta_h(Vm))

# Conductances
GK(gK, n) = gK * (n ^ 4.0)
GNa(gNa, m) = gNa * (m ^ 3.0) * h
GL(gL) = gL

# Differential equations
function dv(Vm, GK, GNa, GL, Cm, EK, ENa, El, I, dt);
    ((I  - (GK * (v - EK)) - (GNa * (v - ENa)) - (GL * (v - El))) / Cm) * dt
end
dn(n, Vm, dt) = ((alpha_n(Vm) * (1.0 - n)) - (beta_n(Vm) * n)) * dt
dm(m, Vm, dt) = ((alpha_m(Vm) * (1.0 - m)) - (beta_m(Vm) * m)) * dt
dh(h, Vm, dt) = ((alpha_h(Vm) * (1.0 - h)) - (beta_h(Vm) * h)) * dt
I = T * 0.002

# Initial conditions and setup
v = -65
m = m_inf(v)
n = n_inf(v)
h = h_inf(v)

v_result = Array{Float64}(undef, length(T))
m_result = Array{Float64}(undef, length(T))
n_result = Array{Float64}(undef, length(T))
h_result = Array{Float64}(undef, length(T))

v_result[1] = v
m_result[1] = m
n_result[1] = n
h_result[1] = h


for t = 1:length(T)-1
    GKt = GK(gK, n)
    GNat = GNa(gNa, m)
    GLt = GL(gL)

    dvt = dv(v, GKt, GNat, GLt, Cm, EK, ENa, El, I[t], dt)
    dmt = dm(m, v, dt)
    dnt = dn(n, v, dt)
    dht = dh(h, v, dt)

    global v = v + dvt
    global m = m + dmt
    global n = n + dnt
    global h = h + dht

    v_result[t+1] = v
    m_result[t+1] = m
    n_result[t+1] = n
    h_result[t+1] = h
end

p1 = plot(T, v_result, xlabel="time (ms)", ylabel="voltage (mV)", legend=false, dpi=300)

The Parameters

There are seven parameters that define the Hodgkin-Huxley model. The maximum potassium, sodium and leak conductances gK, gNa and gL. Then there are the equilibrium potentials for potassium, sodium and leak, EK, ENa and El. Finally, there is Cm, the membrane capacitance. In a nutshell, the models comes down to calculating the fraction of sodium and potassium channels that are open at a time point. Together with the maximum conductance, the membrane potential and the equilibrium potential, the fraction of open channels tells us how much current is flowing at the time. The amount of current filtered by the membrane capacitance in turn tells us by how much the voltage changes. The fraction of open channels is given by n, m and h.

Differential Equations

We need to track the change of the voltage (v), potassium channel activation (n), sodium channel activation (m) and sodium channel inactivation (h). Let’s consider the potassium channels first. Active potassium channels can stay active or transition to the inactive state and inactive sodium channels. Since these potassium channels are voltage gated, the chance that they transition depends on the voltage. The function alpha_n gives the transition rate from inactive to active and beta_n gives the transition rate from active to inactive. For the sodium channels, the situation is almost identical. However, they can also be in a third state, that corresponds to depolarization induced inactivation.

Now that we are able to keep track of the states of our channels we can calculate the conductances. GK calculates the potassium conductance, GNa the sodium conductance and GL the leak conductance. Those conductances are then used in the dv function to calculate the currents based on the voltage. And that’s basically it. The integration method is simple forward euler inside the for loop.

Julia Notes From a Beginner

This is my very first Julia script so I have some thoughts. This is a completely non-optimized script and it is extremely fast, despite the for loop. From what I learned so far, for loops in Julia are known to be fast and they can allegedly outperform vectorized solutions. This is very different from Python, where we strictly avoid for loops, especially when concerned about performance.

For now I am confused by the scope and the use of the global keyword. Scope seems to operate similar to Python, where functions and for loops have seamless access to variables in the outer scope. Assigning variables on the other hand seems to be a problem inside the for loop, unless the global keyword is used.

Generally I am very happy with the Julia syntax. I think I could even code Python and Julia back to back. One major problem is of course indices starting at 1 but I get the difference in convention. I’m looking forward to my next script.

Threshold Detection in NumPy

Many signals are easily detected by their size. We will learn how to detect the indices where signals cross a threshold with NumPy. These are our practice signals.

import numpy as np
import matplotlib.pyplot as plt
data = np.array([0, 0, 0, 5, 5, 5, 5, 0, 0, 0, 0, 4, 4, 4, 0, 0, 0])
plt.plot(data, marker='o')

We will perform two simple steps to detect the threshold crossings: 1. Make the data binary, in a way that they are true when larger than the threshold and false when lower or equal. 2. Take the difference of the binary signal. This gives us a boolean array that is true when the threshold was crossed. We can combine those steps into one line.

threshold = 2
threshold_crossings = np.diff(data > threshold, prepend=False)

Plotting shows us that threshold_crossings is true after the threshold was crossed.

plt.plot(data2, marker='o')
plt.plot(thr_crossings, marker='o')
plt.legend(("Data", "Threshold Crossings"))

To get the indices of the threshold crossings we can use np.argwhere(), which returns the true indices from a boolean array.

np.argwhere(threshold_crossings)[:,0]
# array([ 3,  7, 11, 14], dtype=int64)

Threshold crossings occur at 3, 7, 11 and 14. Sometimes we only need the upward or downward crossings. We can simply isolate those by slicing the indiced array.

np.argwhere(threshold_crossings)[::2,0]  # Upward crossings
# array([ 3, 11], dtype=int64)
np.argwhere(thr_crossings)[1::2,0]  # Downward crossings
# array([ 7, 14], dtype=int64)

Sometimes we want to find the point before the threshold is crossed, rather than after. There is one simple trick in np.diff instead of setting prepend=False, we set append=False.

threshold = 2
post_threshold_crossings = np.diff(data > threshold, prepend=False)
pre_threshold_crossings = np.diff(data > threshold, append=False)
plt.plot(data2, marker='o')
plt.plot(post_threshold_crossings, marker ='o')
plt.plot(pre_threshold_crossings, marker='o')
plt.legend(("Data", "Post Crossings", "Pre Crossings"))

Make sure to check out the documentation for np.diff and bonus points if you can figure out why exactly this works. Detecting threshold crossings is an easy but important part in most of my analysis pipelines and now you can do it too.

Fantastic Programming Languages and Where to Find Them

There are many programming languages out there and committing to one of them can be intimidating. The good news is that most programming languages that are relevant today are solid. You can’t go wrong with any of them and they are all worth your time. At the same time, the programming language you pick will strongly influence your work as a PhD student or postdoc and the opportunities you have afterwards. Here I’ll guide you through the things to consider when choosing your first or second programming language.

Which language are others using?
The conformists way

Most programming languages are amazing and the technical differences between them are small and only relevant under special circumstances. More important than the technical details are the community, research field and laboratory you want to join. If you already know which lab you want to join, find out which language is used there. If the lab uses multiple languages try to find out what kind of task you will have, find the people with similar tasks and find out which language they are using. If you don’t know your lab yet but you know the field, try to find out which language dominates that field. You can do so by reading papers, job advertisements or by directly writing to other PhD students and postdocs.

This conformist approach is not satisfying for everyone. I get it. I actually brought a language to my lab that nobody else there was using. More on that later. Most people will benefit greatly from this conformist route. Let me first tell you the many advantages, before I explain why you might benefit from choosing another language. First, you maximize the people that can help while you are learning and while you are engaging with the technical details of your tasks. Second, you will find many solutions ready to use. You will be able to grab scripts and functions from your colleagues and you will be able to move on from programming details to solving your actual task much faster. Third, your colleagues are successfully contributing to the field so it is likely that other people in the same field are using the same programming language. That means your programming experience will help you find a postdoc job if you want to stay in that field.

So why would you want to miss out on those advantages? In short: you don’t. You probably don’t know better than your future colleagues (yet). You don’t want to reinvent the wheel. But there are some other things to consider and sometimes it can pay off to deviate from lab culture. If you do so, this will affect your work. In a nutshell, you will be less productive in the short-term but more productive in the long-term, if you choose your programming language well.

Which languages are used outside academia

Many of us are not looking to stay in academic research. Even if you are committed to academia, this point is worth considering. Things and people can change. It is considered good practice to have a plan B. While many programming languages are used both inside and outside of academia, some labs use programming languages that are nearly worthless in the non-academic job market. Sometimes very specific research requires a niche programming language. Other times a lab was simply unable or unwilling to transition to a more common language.

I recommend making your plan B as concrete as possible. Maybe it doesn’t involve programming at all. Then you should fall back to the conformists way. Otherwise, check your plan B job market for programming languages that are required or advantageous. I will go through some programming languages later and give my opinion on their usage inside and outside academia. However, I cannot give a definitive answer and these job markets evolve rapidly. If you learned programming during your PhD you will be in a great position to pick up another language. You will probably have to learn more than one language anyway. Companies have so called ‘stacks’. A stack is a collection of software (including some programming languages) and people will be hired for “full-stack” or subsets of that stack. A non-academic stack will likely involve at least passing familiarity with more than one programming language. Either way, keep an eye out for labs that use niche programming languages. It might be worthwhile to defy lab culture and choose a more common language.

Performance is less important than you think

Beginners consistently overestimate the importance of performance or speed. I’ve been there. When I started out I though fast computations would be the deciding factor for or against any programming language. It isn’t, because human time is more valuable than computer time. By orders of magnitude. In research, the bottleneck is rarely computational time, it’s almost always human time. Performance only becomes relevant with very computationally intensive projects.

Imagine you are writing a script that will take one minute to run in the end. A 10x decrease in performance (now it takes 10 minutes) is very tolerable, if you get some perks for it. It is now easier to debug, easier to build on and more other people can use it and give you credit for it. If you are writing a simulation project that takes 10 days, a 10x decrease (now it takes 100 days) does not look so attractive anymore. In the real world performance of both scripts and programs is slightly more complicated but the point stands. If you are not sure whether you are in the 1 minute or 10 days category, you should try to figure it our before deciding. Just ask your colleagues and advisers.

With that, I want to move on to some fantastic programming languages. We will look at their strengths and their weaknesses. Always keep in mind the conformists way. Only choose your own language when there are clear advantages. I will briefly discuss which languages are worthwhile to use even if your lab is not on board and which ones are to be avoided even if your lab is working with them. As a disclaimer: I have hands on experience with Python, R and Matlab. For the other languages I either have second hand experience (people in my surrounding work with them) or I did some research about them.

Python and R

Python and R share a chapter because they are both excellent and should be your first choice if there are no other languages established in the lab. If the lab uses either or both, even better! Both are completely free and open source. Python is my personal favorite but I am slightly biased after years of working with it almost daily.

Both Python and R are also heavily used outside of academia. R is consistently ranked as the top required language for data scientists. Python ranks second. A downside of R is that it is specifically designed for statistics and data analysis. This can be an advantage, because as scientists this is the biggest part of our job when we program. However, Python is a complete all-rounder. It can do data analysis but it can also do web development, game development and everything else you can think of. Web development is also possible with R, but it centrally revolves around data analysis and visualization. I guess I’m trying to say, Python would be better for your private coin collecting website (minor upside).

Some communities slightly favor Python, others favor R. Astronomy for example really likes Python. Single cell sequencing on the other hand prefers R. Check with your field and colleagues. Finally, both languages are well documented and have massive communities behind them. This makes it much more likely that someone already solved an issue you are trying to google. In summary, any second you spend learning Python or R is well worth it.

MATLAB

MATLAB is developed by MathWorks and it is specifically designed for science and engineering. Unlike Python and R, it is neither free nor open source. If your lab pays for a license, the heavy price tag might not bother you. The language itself is more similar to Python than R (many of Python’s numerical computation capabilities were developed with MATLAB users in mind). A nice upside is that the language comes with a very strong graphical user interface and debugging capabilities. This can be very helpful. Unfortunately, MATLAB is much less popular outside of academia than Python and R. Especially smaller companies and early start-ups are unwilling to pay for MATLAB when there are free equivalents. Overall, even academics seem to be slowly transitioning away from MATLAB. However, I would not advise strictly against MATLAB if it is very popular in your field of research or the lab you want to join. Especially since Python and MATLAB are similar enough that transitioning is easy, once you learned MATLAB. I would only advise against it if you have concrete plans to leave academia for data science.

Julia

Julia is being traded as the future Python. For now it has a smaller community but it was specifically designed to keep the advantages of Python while improving performance. I currently don’t recommend Julia, unless you have some computationally intensive projects or you anticipate such projects in your professional future. The more people use a language, the higher the chance that even specialized tasks are already implemented by someone else. Julia is not yet widely adopted. If your lab uses Julia, I recommend rolling with it.

Igor Pro

Igor Pro by WaveMetrics is a commercial software and programming language. Like MATLAB, it comes with a rather rich graphical user interface. It is the first programming language I actively discourage. Even if you feel like spending money, you are probably better off with MATLAB. Igor Pro is even less popular outside of academia than MATLAB.

When I started my master thesis, the established language for my main task (intracellular electrophysiology) was Igor Pro. I decided against using it, because I had never heard about it before, the graphical user interface did not look very appealing and I had some Python experience from small hobby projects. So I decided to do the analysis myself with Python. The consequence of that was that I was extremely slow in the beginning. Had I just done it with Igor Pro, I could have taken the scripts that were already used in the lab and could have used them with minimal learning effort. Instead I had to reinvent the wheel and learn Python at the same time. This made me extremely inefficient in the short term.

In the long term it was the best choice I made during my masters, because in the long term it made me more efficient. More than that, I was able to take on new tasks that would have been nearly impossible with Igor Pro. I started to get into biophysical neuronal network simulations. Python has several packages for that. I’m not aware of any such Packages for Igor Pro. That being said, you or your colleagues might not be willing to lose short term efficiency, especially if you don’t care for programming and just want to get the job done. If you enter a lab where Igor Pro is being used, roll with it to get things done more quickly.

JavaScript

JavaScript is particularly popular for web development but it can also do some data analysis. ImageJ, a popular scientific image processing software, is written in Java and you can write JavaScript code for it. It is free and open source. It is worth checking out if you are going to do a lot of image processing. Otherwise I don’t recommend it for scientific purposes.

Most programming languages are fantastic

Committing to a programming language is difficult. Luckily, all relevant programming languages today are amazing. They all get the job done and are well worth your time (especially the ones on top of this list). And this brings me to my take-home message. Don’t worry about the technical differences between Python, R and MATLAB. Especially don’t worry about performance. Your scientific field and laboratory are much more important factors for your choice. Isolating yourself comes with a price. I also hope I made clear why and when that price is worth it. It might give you long term advantages. Finally, the best thing you can do today is to start programming and to stop worrying.