Measuring and Visualizing GPU Power Usage in Real Time with asyncio and Matplotlib

In this post we will learn how to periodically measure the power power usage of our GPU and plot it in real time with a single Python program. For this we need concurrency between the measuring and the plotting part of our code. Concurrency means that the measuring process will got to sleep after measuring. While the measuring process is asleep the plotting process can do the plotting and goes to sleep as well. After a defined amount of time the measuring process wakes up and does the measuring if the CPU allows it, then the plotting process starts and so on. We achieve concurrency with asyncio and the plotting is done with Matplotlib. To measure the GPU power we use pynmvl (Python Bindings for the NVIDIA Management Library). Before we get into the code, here is a video showing the interface in action.

import pynvml
import matplotlib.pyplot as plt
import time
import numpy as np
import asyncio

"""Initialize GPU measurement and parameters"""
pynvml.nvmlInit()
handle = pynvml.nvmlDeviceGetHandleByIndex(0)
measurement_interval = 0.1  # in seconds
plotting_interval = 0.2  # in seconds
time_span = 20  # time span on the plot x-axis in seconds
m = t = np.array([np.nan]*int(time_span / measurement_interval))
mW_to_W = 1e3

"""Initialize the plot"""
plt.ion()
plt.rcParams.update({'font.size': 18})
figure, ax = plt.subplots(figsize=(8,6))
line1, = ax.plot(t, m, linewidth=3)
ax.set_xlabel("Time (s)")
ax.set_ylabel("GPU Power (W)")

async def measure():
while True:
measure = pynvml.nvmlDeviceGetPowerUsage(handle) / mW_to_W
dt = time.time() - ts
m[:-1] = m[1:]
m[-1] = measure
t[:-1] = t[1:]
t[-1] = dt
await asyncio.sleep(measurement_interval)

async def plot():
while True:
line1.set_data(t, m)
tmin, tmax = np.nanmin(t), np.nanmax(t)
mmin, mmax = np.nanmin(m), np.nanmax(m)
margin = (np.abs(mmax - mmin) / 10) + 0.1
ax.set_xlim((tmin, tmax + 1))
ax.set_ylim((mmin - margin, mmax + margin))
figure.canvas.flush_events()
await asyncio.sleep(plotting_interval)

async def main():
await t2, t1

if __name__ == "__main__":
ts = time.time()
loop = asyncio.new_event_loop()
loop.run_until_complete(main())

We will start with the functions async def measure() and async def plot() since they are central to the program. First, note that neither of them are ordinary functions because of the async keyword. This keyword has been added in Python 3.5 and in earlier Python versions we could have instead decorated the functions with the @asyncio.coroutine decorator. The async keyword turns our function into a coroutine which allows us to use the await keyword inside. With the await keyword we can put the coroutine to sleep with await asyncio.sleep(measurement_interval). While asleep the asyncio event loop can run other coroutines that are not asleep. More on the asyncio event loop later. Because we want to keep measuring until someone terminates the program we wrap everything in measure into an infinite loop while True:.

So what do we do while measuring? Outside of the coroutine we define two arrays m, t, one to hold the measured power and the other to measure the passed time. Measuring time is important because energy is power during a time period and we generally need to be sure that the coroutine isn’t getting stuck asleep much longer than we want it to. When we measure a value we move the current elements in the measurement array one to the left by assignment with m[:-1] = m[1:]. We then assign the newly measured value to the right of the array with m[-1] = measure. That is all there is to our measurements.

Our plot coroutine works just like the measure coroutine except that it plots whatever is in the time and measurement arrays before it goes to sleep. The plotting itself is basic matplotlib but it is important to note that figure.canvas.flush_events() is critical for updating the plot in real time. Furthermore, when we initialize the plot, plt.ion() is important for the plot to show properly.

Coroutines are not called like normal functions. They do their work as tasks within an asyncio event loop. This event loop knows which coroutines are asleep and decides which coroutine starts working next. This task may seem manageable with two coroutines but with three it becomes tedious already. As a coroutine goes to sleep two may be awake, waiting to get to work. The event loop has to decide which one goes next. Luckily asyncio takes care of the details for us and we can focus on the work we want to get done instead. However, we need to create an event loop with loop = asyncio.new_event_loop() and then we start it with loop.run_until_complete(main()). The coroutines only get to work when the loop starts. Both our coroutines are in main(), thereby both become part of the event loop. Because of the event loop I recommend running the code from the command line. Running it in interactive environments can cause problems because other event loops might already be running there.

With that, we already covered the most important parts of the code. There are several things we could do differently and some of those might make the code better. For one, we could use a technique called blitting (explained here) to improve the performance of the plotting. We could also do the plotting with FuncAnimation (explained here) instead of writing our own coroutine. I tried that for a while but was not able to make the animation and the measurement() coroutine work together in the same event loop. There probably is a way to do it that I did not find. Let me know if you have other points for improvement.

You can find pynvml here. asyncio is part of the Python installation and you can find the docs here. I was inspired to do this project by a package called codecarbon that you can find here. It estimates the carbon footprint of computation and I plan to blog about it soon.