Saturday, April 27, 2013

Fun with Numba, NumPy and F2PY

I've started using Numba to speed up MD simulations for our course in Molecular Statistics, which I teach together with +Jan Jensen and +Jimmy Charnley Kromann.

One exercise is using the Velo-Verlet algorithm to simulate a Lennard-Jones gas/liquid with periodic boundary conditions. Since we do everything in Python in this course, running the actual simulation is quite slow. What we do is we supply the students with a compiled FORTRAN module, compiled with F2PY which they can use when they've written their own Lennard-Jones gradient code.

The FORmula TRANslator module is extremely fast, compared to Python code. I rewrote the code from last year's course to a function I could use with Numba and compare directly to the F2PY module

For 100 particles:

  • Numba: 0.146 ms/iteration (466x speed up)
  • F2PY: 0.236 ms/iteration (289x speed up)
  • Python: 68.06 ms/iteration
Now, I'm pretty sure a pure FORTRAN implementation would be faster than the F2PY, but I am very impressed with the Numba, since it's standard Python code and MUCH more simple to read than FORTRAN -- in theory all you have to do is write @autojit before a function definition. This is of course in theory ... I didn't do anything to improve on the readability of the Lennard-Jones code here. Before I posted, I had actually inserted a comment saying """This code block is unreadable""".

Now, there were a few issues I discovered:

1) @autojit vs. @jit(argtypes=[double[:,:], double[:]])

Since Numba has to decide what arguments a function takes before it's being compiled, it needs to know what possible type of arguments the function takes. If you're lucky, Numba decides on the right type and you can get away with @autojit, but sometimes autojitting makes the code SLOWER than standard, interpreted Python. In one case I got a factor of 10x slower with autojit, but explicitly stating the function argument types with @jit(argtypes = .... ) I got a speed up of 20x on the same code compared to interpreted Python.

2) Returning tuples in compiled code block

Numba does not allow a tuple to be returned inside a compiled code block. So don't do this.

3) Use Numpy properly

  • Numba is NumPy aware, so code ran faster when numbers were stored in numpy.array types rather than just regular Python lists.
  • At first I was numpy.round() to round to nearest integer in the periodic boundary condition code. Switching three numpy.round() calls to numpy.rint() gave a speed up of around 100x on the code execution.
  • Use U[i, j] instead of U[i][j] on NumPy arrays. Not much of a difference in the vanilla Python code, but MASSIVE speed gains in compiled code.

Python Lennard-Jones code:

F2PY Lennard-Jones code:

Python Velo-Verlet solver:

Friday, April 26, 2013

Introducing 32 students to Linux and Python via VirtualBox

I am currently teaching the Molecular Statistics course together with +Jimmy Charnley Kromann and +Jan Jensen at the University of Copenhagen. My part of the course is teaching the students how to program simple molecular simulation algorithms in Python. +Jan Jensen does the theoretical lectures, and I give lectures in basic Python programming and how to implement the equations from Jan.

One of the challenges we faced before the start of the course was to just get Python, Numpy and Matplotlib up and running on every single student's laptop. Since the students run everything from Windows XP, 7 and 8 to Linux and Mac, this can pose quite a challenge for the students, some of which have barely ever even installed a program before.

This year, we opted to create a VirtualBox with Ubuntu on it, and make sure everything the need throughout the course was pre-installed. The students could then simply download the VirtualBox program and install our VirtualBox image.

+Jimmy Charnley Kromann made this excellent video of how to get the VirtualBox going before the first class:

My biggest fear was that the students would be frightened and confused by their first meeting with a non-Microsoft system. But that (fortunately!) didn't turn out to be an issue at all.

These are my observation from the first lecture and first exercises class:
  • To my surprise all of the students had managed to bring their laptop with a working VirtualBox to the first class. ALL students! I was pretty thrilled since that meant they all could follow my programming from the projector on their own laptop, which is what I want them to do during my lectures.
  • What was going on on the projector was the same as on their laptop during the lecture, which I think help lessen the confusion.
  • We didn't have to do ANY tech support during the interactive lectures or the following exercises class to get Python working. Only hitch was two students who wanted American keyboard layout (our Box came with a Danish keyboard layout by default). This was quickly (though some might say awkwardly) solved by "setxkbmap us" in the terminal.
  • Some of the groups of students wanted an extra module to make a video of the things they had plotted in Matplotlib. +Jimmy Charnley Kromann then posted simple instructions and the relevant "sudo apt-get install" command on the course website -- so that went relatively easy as well.
  • Things which didn't work very well were the shared clipboard and drag-and-drop of files between the host OS and the VirtualBox. That only worked on 50% of the student's laptops. Must be a bug in VirtualBox somewhere.
  • I only overheard the sentence "Grrr, I hate linux!" once.

So far so good!

Monday, April 22, 2013

Numba -- It's like Python on steroids ... on steroids.

I'm teaching a course in molecular simulations this year in which the students have to program simulations of various simple systems. We're trying to keep things as simple as possible and everything is done in Python.

The task of the second week of the course is to program a simple simulation of a 2D Lennard-Jones gas. The computationally intensive step in this exercise is calculating the energy and gradient. In the past we've (and by we, I mean +Casper Steinmann) written a FORTRAN module and used F2PY to exploit the awesomeness of the good old FORmula TRANslator and have the students link their Python scripts with a precompiled .so module for this to run at an acceptable speed

This solution is of course completely fine, but not very general. I've been looking into Cython, pypy and various other methods to speed things up a bit. But they all seemed quite elaborate an unPythonic, and you might as well do the whole thing in C then.

Yesterday Numba was brought to my attention on this blog. In short, Numba is a way to compile (at execution time) Python functions into C which normally makes your program run substantially faster. The blog has a very nice example where the code gets a 1000x speedup just by adding one function decorator. And the syntax couldn't be simpler.

Let's try Numba on our Lennard-Jones program. Here is the naive function they have to implement:

The code is then executed via Python as one would normally execute Python code, specifying in the script that the calc_energy_and_gradient() function is to be compiled via Numba. This can be done simply by writing the the @autojit decorator before the function definition. 

You can also specify exactly which type of input the just-in-time compilation takes takes (an example is commented out). This makes the compilation of the function be type specific, but a bit faster.  Make sure to import numba and the specific type of argument (here the arguments are of the type double) and the relevant decorators (jit or autojit).

Conclusion: 10000 gradient and energy evaluations take 16 seconds with the standard Python implementation. 2 seconds with @autojit and 1.5 seconds with the explicit @jit. Not too bad for something that doesn't take ANY code changes.