Saturday, November 13, 2010

GotoBLAS2: A faster BLAS library

This post is about a little trick I have for speeding up my quantum chemistry calculations. For some reason this software is unknown to most people, and thus I decided do a short advertisement!

The bottleneck in all forms of computational quantum chemistry is the CPU time spent to converge a calculation to a given accuracy. The most expensive part is in most cases the evaluation of two-electron integrals,  but during many calculations, a lot of time is also spent doing linear algebra operations. The computational routines for doing linear algebra has been optimized greatly over the years and today there are quite a few standard linear algebra libraries for doing this. One standard API of linear algebra routines is called BLAS, short for "Basic Linear Algebra Subprograms". Many equations involved in quantum chemistry are formulated into matrix notation which is easily interpreted and turned into fast, parallelized code.

When people want to use a fast BLAS library, often well-known libraries such as the ALTAS or Intel MKL BLAS libraries are used. However, there is a slightly faster BLAS library out there, named GotoBLAS2. This library was started by a Japanese guy named Kazushige Goto (the Japanese pronunciation is something lik "go-toe"), who, like Albert Einstein, worked in a patent office while he began developing his own BLAS library. After a while he was 'discovered' by a university in Texas and soon Goto was deported from Japan. Now he is said to be very rich and working for some large coporation somewhere in the US.

Enough with the history! Tobias Wittwer has made a detailed comparison of common, fast BLAS libraries in this pdf: blas_lapack.pdf

You can download the latest GotoBLAS2 here. Not only is GotoBLAS2 faster  than Intel's MKL library, GotoBLAS2 is also completely free of charge and open source under the BSD license.

GotoBLAS2 will be used in future posts on compiling and installing software, for which I am always using GotoBLAS2. So be sure to use this and make the most of your CPU resources!


  1. Interesting! Can GotoBLAS2 be used with GAMESS? And is it parallelized?

    "computational time, which is mainly spent doing standard linear algebra"

    That's not quite accurate. The most time-consuming part are usually the 2-electron integrals. But it is true that if the 2-electron integrals have been effectively parallelized and/or approxmiated, the linear algebra can become the bottleneck for large systems.

  2. Thank you for correcting me on the two-electron integrals!

    In principle GAMESS should be able to use BLAS routines from GotoBLAS2, since all BLAS libraries contain the same standard set of functions, only coded and optimized in different ways.

    I just tried and recompile the current GAMESS on my desktop at home, and it seems like you can only specify the BLAS libraries from MKL, ATLAS and ACML (or a crude, generic set of functions) because you have to go through the ./config script before compiling.
    I will ask Casper for a definite answer and let you know! I'm fairly certain that it is possible to circumvent the standard installation procedure and introduce a non-standard BLAS library somewhere else. Casper may have done this when we tested the AMD Magny Cours machines.

    GotoBLAS2 is fully parallelized, so even if your main program is not parallelized, you can call the linear algebra routines and run those in parallel without your main program ever knowing it. The number of CPUS can be set either fixed when you compile GotoBLAS2 or later on via an environment variable.