Fantastic Programming Languages and Where to Find Them

There are many programming languages out there and committing to one of them can be intimidating. The good news is that most programming languages that are relevant today are solid. You can’t go wrong with any of them and they are all worth your time. At the same time, the programming language you pick will strongly influence your work as a PhD student or postdoc and the opportunities you have afterwards. Here I’ll guide you through the things to consider when choosing your first or second programming language.

Which language are others using?
The conformists way

Most programming languages are amazing and the technical differences between them are small and only relevant under special circumstances. More important than the technical details are the community, research field and laboratory you want to join. If you already know which lab you want to join, find out which language is used there. If the lab uses multiple languages try to find out what kind of task you will have, find the people with similar tasks and find out which language they are using. If you don’t know your lab yet but you know the field, try to find out which language dominates that field. You can do so by reading papers, job advertisements or by directly writing to other PhD students and postdocs.

This conformist approach is not satisfying for everyone. I get it. I actually brought a language to my lab that nobody else there was using. More on that later. Most people will benefit greatly from this conformist route. Let me first tell you the many advantages, before I explain why you might benefit from choosing another language. First, you maximize the people that can help while you are learning and while you are engaging with the technical details of your tasks. Second, you will find many solutions ready to use. You will be able to grab scripts and functions from your colleagues and you will be able to move on from programming details to solving your actual task much faster. Third, your colleagues are successfully contributing to the field so it is likely that other people in the same field are using the same programming language. That means your programming experience will help you find a postdoc job if you want to stay in that field.

So why would you want to miss out on those advantages? In short: you don’t. You probably don’t know better than your future colleagues (yet). You don’t want to reinvent the wheel. But there are some other things to consider and sometimes it can pay off to deviate from lab culture. If you do so, this will affect your work. In a nutshell, you will be less productive in the short-term but more productive in the long-term, if you choose your programming language well.

Which languages are used outside academia

Many of us are not looking to stay in academic research. Even if you are committed to academia, this point is worth considering. Things and people can change. It is considered good practice to have a plan B. While many programming languages are used both inside and outside of academia, some labs use programming languages that are nearly worthless in the non-academic job market. Sometimes very specific research requires a niche programming language. Other times a lab was simply unable or unwilling to transition to a more common language.

I recommend making your plan B as concrete as possible. Maybe it doesn’t involve programming at all. Then you should fall back to the conformists way. Otherwise, check your plan B job market for programming languages that are required or advantageous. I will go through some programming languages later and give my opinion on their usage inside and outside academia. However, I cannot give a definitive answer and these job markets evolve rapidly. If you learned programming during your PhD you will be in a great position to pick up another language. You will probably have to learn more than one language anyway. Companies have so called ‘stacks’. A stack is a collection of software (including some programming languages) and people will be hired for “full-stack” or subsets of that stack. A non-academic stack will likely involve at least passing familiarity with more than one programming language. Either way, keep an eye out for labs that use niche programming languages. It might be worthwhile to defy lab culture and choose a more common language.

Performance is less important than you think

Beginners consistently overestimate the importance of performance or speed. I’ve been there. When I started out I though fast computations would be the deciding factor for or against any programming language. It isn’t, because human time is more valuable than computer time. By orders of magnitude. In research, the bottleneck is rarely computational time, it’s almost always human time. Performance only becomes relevant with very computationally intensive projects.

Imagine you are writing a script that will take one minute to run in the end. A 10x decrease in performance (now it takes 10 minutes) is very tolerable, if you get some perks for it. It is now easier to debug, easier to build on and more other people can use it and give you credit for it. If you are writing a simulation project that takes 10 days, a 10x decrease (now it takes 100 days) does not look so attractive anymore. In the real world performance of both scripts and programs is slightly more complicated but the point stands. If you are not sure whether you are in the 1 minute or 10 days category, you should try to figure it our before deciding. Just ask your colleagues and advisers.

With that, I want to move on to some fantastic programming languages. We will look at their strengths and their weaknesses. Always keep in mind the conformists way. Only choose your own language when there are clear advantages. I will briefly discuss which languages are worthwhile to use even if your lab is not on board and which ones are to be avoided even if your lab is working with them. As a disclaimer: I have hands on experience with Python, R and Matlab. For the other languages I either have second hand experience (people in my surrounding work with them) or I did some research about them.

Python and R

Python and R share a chapter because they are both excellent and should be your first choice if there are no other languages established in the lab. If the lab uses either or both, even better! Both are completely free and open source. Python is my personal favorite but I am slightly biased after years of working with it almost daily.

Both Python and R are also heavily used outside of academia. R is consistently ranked as the top required language for data scientists. Python ranks second. A downside of R is that it is specifically designed for statistics and data analysis. This can be an advantage, because as scientists this is the biggest part of our job when we program. However, Python is a complete all-rounder. It can do data analysis but it can also do web development, game development and everything else you can think of. Web development is also possible with R, but it centrally revolves around data analysis and visualization. I guess I’m trying to say, Python would be better for your private coin collecting website (minor upside).

Some communities slightly favor Python, others favor R. Astronomy for example really likes Python. Single cell sequencing on the other hand prefers R. Check with your field and colleagues. Finally, both languages are well documented and have massive communities behind them. This makes it much more likely that someone already solved an issue you are trying to google. In summary, any second you spend learning Python or R is well worth it.

MATLAB

MATLAB is developed by MathWorks and it is specifically designed for science and engineering. Unlike Python and R, it is neither free nor open source. If your lab pays for a license, the heavy price tag might not bother you. The language itself is more similar to Python than R (many of Python’s numerical computation capabilities were developed with MATLAB users in mind). A nice upside is that the language comes with a very strong graphical user interface and debugging capabilities. This can be very helpful. Unfortunately, MATLAB is much less popular outside of academia than Python and R. Especially smaller companies and early start-ups are unwilling to pay for MATLAB when there are free equivalents. Overall, even academics seem to be slowly transitioning away from MATLAB. However, I would not advise strictly against MATLAB if it is very popular in your field of research or the lab you want to join. Especially since Python and MATLAB are similar enough that transitioning is easy, once you learned MATLAB. I would only advise against it if you have concrete plans to leave academia for data science.

Julia

Julia is being traded as the future Python. For now it has a smaller community but it was specifically designed to keep the advantages of Python while improving performance. I currently don’t recommend Julia, unless you have some computationally intensive projects or you anticipate such projects in your professional future. The more people use a language, the higher the chance that even specialized tasks are already implemented by someone else. Julia is not yet widely adopted. If your lab uses Julia, I recommend rolling with it.

Igor Pro

Igor Pro by WaveMetrics is a commercial software and programming language. Like MATLAB, it comes with a rather rich graphical user interface. It is the first programming language I actively discourage. Even if you feel like spending money, you are probably better off with MATLAB. Igor Pro is even less popular outside of academia than MATLAB.

When I started my master thesis, the established language for my main task (intracellular electrophysiology) was Igor Pro. I decided against using it, because I had never heard about it before, the graphical user interface did not look very appealing and I had some Python experience from small hobby projects. So I decided to do the analysis myself with Python. The consequence of that was that I was extremely slow in the beginning. Had I just done it with Igor Pro, I could have taken the scripts that were already used in the lab and could have used them with minimal learning effort. Instead I had to reinvent the wheel and learn Python at the same time. This made me extremely inefficient in the short term.

In the long term it was the best choice I made during my masters, because in the long term it made me more efficient. More than that, I was able to take on new tasks that would have been nearly impossible with Igor Pro. I started to get into biophysical neuronal network simulations. Python has several packages for that. I’m not aware of any such Packages for Igor Pro. That being said, you or your colleagues might not be willing to lose short term efficiency, especially if you don’t care for programming and just want to get the job done. If you enter a lab where Igor Pro is being used, roll with it to get things done more quickly.

JavaScript

JavaScript is particularly popular for web development but it can also do some data analysis. ImageJ, a popular scientific image processing software, is written in Java and you can write JavaScript code for it. It is free and open source. It is worth checking out if you are going to do a lot of image processing. Otherwise I don’t recommend it for scientific purposes.

Most programming languages are fantastic

Committing to a programming language is difficult. Luckily, all relevant programming languages today are amazing. They all get the job done and are well worth your time (especially the ones on top of this list). And this brings me to my take-home message. Don’t worry about the technical differences between Python, R and MATLAB. Especially don’t worry about performance. Your scientific field and laboratory are much more important factors for your choice. Isolating yourself comes with a price. I also hope I made clear why and when that price is worth it. It might give you long term advantages. Finally, the best thing you can do today is to start programming and to stop worrying.

The Scientists Guide to Python

  • Install Python as part of a data science platform such as Anaconda
  • Use an integrated development environment like Spyder (Installs with Anaconda)
  • Don’t reinvent the wheel, find out what already exists in your field

Python and Science

Python has been adopted with open arms by many scientific fields. A lot of its success is directly linked to the success of NumPy (the Python package for numerical computing) and the scientific Python community (SciPy community). If you want to understand the history of this success story, I recommend reading the recent nature methods paper which gives some strong background about the SciPy community. However, you don’t need to read the paper to use Python or any of the scientific packages.

A downside of Python is that it is a general purpose programming language. Python is used for anything you can think of. Web development, GUI design, game development, data mining, machine learning, computervision and so on. This can be intimidating to beginners. Especially to scientists who have a very specific task they want to automate. Compare this situation to MATLAB. Its purpose is very specific and is even part of the name. It is the MATrix LABoratory. It deals very efficiently with matrices and is generally good at math stuff. Both very valuable and important to most scientists. MATLAB is specifically designed to serve the science and engineering communities. Python on the other hand is for everyone, which can be a weakness in the beginning but can quickly turn into its greatest strength. There are no privileged tasks in Python, they are all handled well (if there is a great community behind the task maintaining it). The purpose of this guide is to ease some of the hurdles scientists face when diving into Python and highlight the many advantages.

The first problem we face is called package management. When we download pure Python from the official website (python.org), we get the core Python functionality. Python in itself does not include most of the features we depend on as scientists. Pythons functionality is organized in so called packages that we need to install if they are not built into Python. Alternatively we can install a pre-made Python distribution that includes scientific packages. This is my preferred way to installing Python and I strongly recommend beginners to start with Anaconda.

Installing Python with Anaconda

Anaconda is a data science platform that installs Python with most of the packages we need as scientists. It is open-source and completely free. When we install Anaconda we get lots of stuff. Most importantly we get Python, NumPy, Matplotlib, Conda and Spyder. We could install those things ourselves but this way we can have them without ever worrying about package management. We already talked about core Python and NumPy. Matplotlib is a package that allows us to plot our data. Conda is a package manager that allows us to update installed packages and install new packages. Finally, Spyder is an integrated development environment (IDE). Spyder stands for Scientific PYthon Development EnviRonment and it is exactly what we need to get started.

The Anaconda navigator. A graphical user interface that helps us find the main components of Anaconda. One of those is the Spyder IDE which you can start by simply clicking launch.

An IDE is extremely useful. For example, it helps us by highlighting important parts of our code visually. By running Spyder you have just setup your own IDE. This is your kingdom now. You will be able to do amazing things here and you should celebrate this moment properly. However, as you celebrate, my duty will be to guide you through the most important parts of this graphical user interface.

Spyder (Scientific PYthon Development EnviRonment). On the left we have the editor. Here we write scripts. Scripts are collections of code that can be execute all together in series when we hit the run button (green arrow). In the lower right is an IPython console. This is where scripts are executed interactively when we run them. We can also type code here and execute it immediately

Lets start with the interactive console in the lower right. Here, IPython is running and it is awaiting your commands. You can type code and run it by hitting enter. You can use it like a fancy calculator. Try 2+2, 2-2, 2*2, 2/2, 2**2. It’s all there. The interactive console is the perfect place to try out commands and see what they do. We can define variables here and import packages. Luckily we installed Anaconda, so we have NumPy already available. The conventional way to import NumPy is import numpy as np. To learn all about NumPy, find my NumPy blog series.

On the left is the editor. This is our code notebook. Here we edit files that store our code. Those files are commonly called scripts. When you hit enter here, code is not immediately executed. You are just moving to a new line. To execute the code here we can hit the green arrow (play button) above. When we do so, all lines of code in the script are executed from top to bottom. When the script finishes, the objects created in the script still exist in the interactive console. This is very useful to debug the script. We can use the interactive console to take a close look at what the script did.

The Power of Scripts

Most of your work will be about writing scripts that do something useful. Many times, there are other ways to solve the same task. Many things can be done by hand, clicking through the graphical user interface of another program. Sometimes this is still the best way, but there are many advantages to writing a script.

A script can be much faster than a human, especially on repeat tasks. When we have to rename 10 files in a certain way, we might decide that it is not worth to write a script. Let’s just do it quickly by hand. But the same task could come back later. This time with 100 files that require renaming. The most important thing is to pick your battles. We cannot automate everything. We will have to make tough decisions.

Another advantage of scripts is that they are a protocol of the analysis. If we can write a script that takes care of everything and is capable of analyzing the entire dataset, we know exactly how the analysis was performed. When we analyze manually we can achieve the same by very carefully writing down everything we do on each data point. However, this can easily fail if something about the analysis becomes important afterwards and it is not part of the protocol. Scripts are also more easily shared than a protocol because the script language (in our case Python) is less ambiguous.

Sometimes even the script fails as a definitive protocol. This is the case when we use packages that change functions. Lets assume we use a function called fastAnalysis from a fictional package called fancyCalc. The way fastAnalysis works changed between version 1.1 and 1.2 slightly. We only know what our script did if we remember which version we used at the time we ran the script. Here another advantage of scripts shows. We can pretty quickly run the same analysis again with different fancyCalc versions. If we get the same result as before this version is probably the one we ran previously. Manual analysis is usually much harder to repeat. Imagine you spent the last 2 month analyzing 200 samples. You had to normalize each sample for its corresponding baseline. How easy would it be to do the same analysis by hand with a slightly larger baseline interval? In a script this would usually involve changing a single number. By hand it could easily lead to another 2 months of work.

The Scripting Workflow

We talked at length about the advantages of scripts but how do we actually write one? For starters we need to know what we want to do. Then we need to look online for a function that we hope can achieve our goal. Then we try out the function. If it does what we want we move on to the next task. Usually we have one pretty large, intimidating task, such as “analyze this whole data set”. Then this task falls into smaller tasks. For example we will have to load our data, do preprocessing, extract regions of interest, quantify and so on. Those tasks fall apart into even smaller tasks. For preprocessing one of those tasks might be to subtract the background. Ideally, if we make our tasks small enough we will be less intimidated and we will find a function that performs this task.

In the age of the internet, programming is easy enough. Unfortunately, to become effective at it we will need to do some learning and get practice. Our goal is to become effective scripters. So we will need to learn three things: 1. We need to learn some basic Python. This will be easy, I promise. 2. We need to learn how to think computationally about our tasks. Only then will be be able to effectively divide entire projects into smaller, manageable tasks. 3. We will learn how to find, evaluate and use task specific packages quickly. Most tasks have already been solved by other people and we never want to reinvent the wheel (unless we have a vision for a really cool wheel that is much better than all the other wheels that already exist). My blog will lead you through all three steps.

What’s next?

The best thing you can do next is to start coding. Today. The amount of different programming languages, distributions of those languages, integrated development environments and resources to learn all of those is immense and can lead to real decision paralysis. You can spend forever trying to decide how to start but you already have everything you need. Install Anaconda, launch Spyder and just let lose. If you are not sure if Anaconda is right for you, install a distribution you think is more suitable. Install pure Python if you are feeling adventurous and just start hacking away. If you are not convinced Python is the best language for you, install another language. If you don’t want to learn with this blog, use another resource. There are so many great ones. Just do it. So now you need to get your hands dirty.

In future blog posts I will go through the three points I think are necessary to become effective scripters. I will start with basic Python and I will collect those blog posts here. I’d be happy if you want to join me.

Doing the Math with Python

First, we take a look at basic math with Python. We learn the basic arithmetic operators, parentheses and how to save the results by assigning a name.