Computational Biochemistry: Compiling GAMESS with CUDA (GPU support)

Friday, February 11, 2011

Compiling GAMESS with CUDA (GPU support)

As I mentioned in a previous post, much of the mathematics involved in quantum, chemistry can be formulated to be massively parallelized and implementation exists so you can run most types of calculations on hundreds or thousands of cores. I've heard people from Pacific Northwest National Laboratory running parallel coupled cluster CCSD calculations on 225,000 processors at 1.3 PFlops!

Very few people have access to hundreds or thousands of individual CPU cores at home or work, but most of us have access to hundreds of cores on the GPU that drives the computer graphics. So let's use our GPUs for massively parallel quantum computing, dawg!

This post tells you how to compile the newest GAMESS (Oct 10, 2010, R1) so it runs RHF 2-electron integrals using an NVIDIA GPU.

I will follow up with a post on the theory behind the GPU parallel 2-el integral routines and a post on GAMESS/GPU benchmark. I've used a lot of the pointers in the gamess/libqc/aaa.readme.1st file. Let me know if you found this helpful.

In order to do compile GAMESS with CUDA support (using this guide), there are a few requirements:

I am going assume 64-bit Linux, because I only tried this on that platfrom.
You need a copy of GAMESS from at least Oct 10, 2010. I'm further more going to assume that you already have properly installed said version of GAMESS and that your installation works. I'm assuming it's installed in ~/gamess
You need a NVIDIA graphics card which is CUDA enable (all newer NVIDIA cards are) with a working set of drivers.
The GNU gcc and g++ compilers. I've only been able to make this work with 4.1 and 4.3, however. These are included the repositories of all major Linux releases. Other versions may work, but these are "guarateed" to work. Guaranteed, as in they potentially may compile the code. Get these via "sudo apt-get install gcc-4.3 g++-4.3", "yum install", etc. or the ditto of your distro.
CUDA Toolkit version 2.3. You can download it here. GAMESS does not work with the current version (which is 3.2, at the time of writing. Get the version that matches your Linux the most. I used the Ubuntu 9.10 version on a Ubuntu 10.04 LTS system and the RedHat Enterprise Linux 5.3 version on a CentOS 5.4 system - both worked just fine!
A compiled and version of Boost 1.43. I tried some other versions, but they didn't work. 1.43 is "guaranteed" to work.
You need an auxiliary code preprocessor named "cheetah" to do stuff to the C++ source code. I used the newest of 2.4.4.

I can't really help you with installing the NVIDIA drivers, but there are many guides floating around on the interwebs. It is likely that the one that came with your Linux works, but it may not. In that case, get a newer driver. The method to do this is highly OS dependent. Unfortunately, you can't use the open source nouveau nv driver - you need to run the real (and closed) McCoy from NVIDIA.

UPDATE: I suggest running the newest beta driver from NVIDIA, since it includes some CUDA specific improvements:

wget http://us.download.nvidia.com/XFree86/Linux-x86_64/270.18/NVIDIA-Linux-x86_64-270.18.run

Step 1) Installing and modifying Boost 1.43 from the source. First make a temporary directory and get Boost:

mkdir ~/temp
cd temp
wget http://downloads.sourceforge.net/project/boost/boost/1.43.0/boost_1_43_0.tar.bz2
tar xjf boost_1_43_0.tar.bz2
cd boost_1_43_0/

Now, compile and install Boost. I always use ~/programs/ for my installation. Boost uses it's own configure and make scripts, and takes about 5-10 minutes to compile on my machine.

./bootstrap.sh --prefix=/home/andersx/programs/boost_1_43_0
./bjam
./bjam install

nvcc (the CUDA compiler) requires three minor modifications of the Boost header files, in order to compile code linked to Boost.

Step 1.1) Change /home/andersx/programs/boost_1_43_0/include/boost/mpl/aux_/integral_wrapper.hpp line 59 from:

#if BOOST_WORKAROUND(__EDG_VERSION__, <= 243)

#if BOOST_WORKAROUND(__EDG_VERSION__, <= 243) || defined(__CUDACC__)

Step 1.2) Change /home/andersx/programs/boost_1_43_0/include/boost/mpl/size_t_fwd.hpp around line 22, so that one original line is in an 'else' clause:

#if defined(__CUDACC__)
   typedef std::size_t std_size_t;
   template< std_size_t N > struct size_t;
#else
   template< std::size_t N > struct size_t;
#endif

Step 1.3) Change /home/andersx/programs/boost_1_43_0/include/boost/mpl/size_t.hpp around line 23, so that four original lines are in an else clause:

#if defined(__CUDACC__)

  #define AUX_WRAPPER_VALUE_TYPE std_size_t  

  #define AUX_WRAPPER_NAME size_t    

  #define AUX_WRAPPER_PARAMS(N) std_size_t N 

#else  

  //typedef std::size_t std_size_t;    

  #define AUX_WRAPPER_VALUE_TYPE std::size_t    

  #define AUX_WRAPPER_NAME size_t    

  #define AUX_WRAPPER_PARAMS(N) std::size_t N 

#endif

Step 2) Download and install Cheetah. This is pretty much straight forward:

cd ~/temp

wget http://pypi.python.org/packages/source/C/Cheetah/Cheetah-2.4.4.tar.gz

tar xvf Cheetah-2.4.4.tar.gz

cd Cheetah-2.4.4/
 python setup.py install

Step 3) Get the CUDA Toolkit v. 2.3:

You can follow this link and get the version which matches your Linux the closest. I used the Ubuntu 9.10 version on a Ubuntu 10.04.2.

cd ~/temp

wget http://developer.download.nvidia.com/compute/cuda/2_3/toolkit/cudatoolkit_2.3_linux_64_ubuntu9.04.run

sh ./cudatoolkit_2.3_linux_64_ubuntu9.04.run

This installs the nvcc compiler. I used the default path which is /usr/local - you may of course put nvcc wherever you want. This can be useful, if you want the newest nvcc for another program and have different versions installed simultaneously.

You then need to add the following lines to your ~/.bashrc file in order to always be able to find the CUDA compiler and libraries:

export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

then write

source ~/.bashrc

or log out and back in to set the environment variables properly.

Step 4) You are now ready to compile the GPU Fock routines (called libqc). These are located in the libqc directory in the GAMESS directory. You may need to add a --mtune=native option. I found it needed with gcc-4.3, but using the option didn't work with gcc-4.1, YMMV. "native" tells your compiler to optimize the code for the local machine at compilation time.

cd ~/gamess/libqc

if you use gcc-4.3 and g++-4.3:

CC=gcc-4.3 CXX=g++-4.3 CXXFLAGS='-O2 -DNDEBUG -msse3 -ffast-math -ftree-vectorize -mtune=native' ./configure --with-gamess --with-integer8 --with-boost=/home/andersx/programs/boost_1_43_0 --prefix=/home/andersx/gamess/libqc

if you use gcc-4.1 and g++-4.1:

CC=gcc-4.1 CXX=g++-4.1 CXXFLAGS='-O2 -DNDEBUG -msse3 -ffast-math -ftree-vectorize' ./configure --with-gamess --with-integer8 --with-boost=/home/andersx/programs/boost_1_43_0 --prefix=/home/andersx/gamess/libqc

If the configure script ran without problems, you're now ready to compile libqc:

make

make install

This took my PC around 20 minutes. I thought it had crashed, but there was one file which just took 15 minutes to compile. Last thing to do in this step is making libqc visible to your compiler and GAMESS. Add the following line to ~/.bashrc and source it as in step 3.

export LD_LIBRARY_PATH=/home/andersx/gamess/libqc:$LD_LIBRARY_PATH

Step 5) Linking GAMESS to libqc:

In the ~/gamess/comp file, you need to set this flag to true:

GPUCODE=true 

Now, recompile the rhfuhf GAMESS routine with libqc:

./comp rhfuhf

Finally, in the GAMESS linking script (~/gamess/lked) you need to link to libqc properly. Set the following flag to true:

GPUCODE=true

Then change the library pathnames at the second place the string 'GPUCODE' appears. The three paths must point to your libqc, CUDA, and Boost libraries.

Finally, you need to link a new executable. Type the following:

./lked gamess gpu 

If linking was successful, there is now an executable named gamess.gpu.x.

Congratulations: You're now able to do 2-el integrals and Fock-matrix formation GPU parallel.

27 comments:

Jan JensenFebruary 12, 2011 at 8:36 AM
Very useful, thanks! I look forward to the follow up posts. Especially the one where PM3 runs 100 times faster :)
ReplyDelete
Replies
ArmadilloMarch 5, 2011 at 4:32 PM
that's fantastic! But how could I set the # of cpu?
ReplyDelete
Replies
UnknownMarch 6, 2011 at 2:48 PM
Hi Armadillo!

If your original (non-GPU) compilation works with multiple cpus, then the gamess.gpu.x executable should work the same way.

Most likely you will do something this to run gamess on 4 cpus:

./rungms my-molecule.inp 01 4

In the above "01" means that your calling the gamess.01.x executable. You want to change this to "gpu" so you call the gamess.gpu.x executable this way:

./rungms my-molecule.inp gpu 4

(In the above "4" of course requests four cpus from the rungms script.)
ReplyDelete
Replies
ArmadilloMarch 22, 2011 at 8:20 PM
Thanks! Now it runs smoothly.
ReplyDelete
Replies
AnonymousMarch 23, 2011 at 8:19 PM
Excuse me, I cannot compile the libqc correctly, could you send me a copy of compiled libqc.a libqc_gamess.a librysq.a files? My email is yiming_chen#brown.edu. Thanks very much!
ReplyDelete
Replies
UnknownMarch 23, 2011 at 8:52 PM
Hi!

libqc can be triggy to compile. What error do you get? And what compilers, etc?
ReplyDelete
Replies
SherpamanApril 1, 2011 at 10:42 AM
Hi Anders,
I've also some problem, I've tried many combinations... but i'm still stuck with an error.

my actual configuration is:

OS = Ubuntu 10.10
gcc = 4.3
libboost = 1.43
cheetah = 2.4.2.1
nvcc = 2.3
NVIDIA Card = GeForce GTX 275
CPU= IntelCore2Quad Q9550@2.83GHz

i have configured using:

export CC=gcc-4.3
export CXX=g++-4.3
export CXXFLAGS='-O2 -DNDEBUG -msse3 -ffast-math -ftree-vectorize -mtune=native'
export CUDA=/apps/cuda/current/bin/nvcc

./configure --with-gamess --with-integer8 --with-boost=/usr/local/include/boost_1_43_0 --prefix=/apps/GAMESS.2010.2/libqc

then

$> make

and i get:

[...]
(a lot of)
./fock.cu(190): Warning: Cannot tell what pointer points to, assuming global memory space
[...]
(compilations hags with ptxas on fock.ptx for ~15 min, and then...)

source='reduce.cu' object='reduce.lo' libtool=yes \
DEPDIR=.deps depmode=nvcc /bin/bash ../../../config/depcomp \
/bin/bash ../../../libtool --tag=CUDA --mode=compile /apps/cuda/current/bin/nvcc -DHAVE_CONFIG_H -I. -I../../../config -I../../../src -I/apps/cuda/current/bin/../include -I/usr/local/include/boost_1_43_0/include -I../../../src/externals -arch=sm_13 -c -o reduce.lo reduce.cu
libtool: compile: /apps/cuda/current/bin/nvcc -DHAVE_CONFIG_H -I. -I../../../config -I../../../src -I/apps/cuda/current/bin/../include -I/usr/local/include/boost_1_43_0/include -I../../../src/externals -arch=sm_13 -c reduce.cu -o reduce.o
reduce.cu(41): error: calling a __global__ function from a __global__ function is not allowed
[...]
12 errors detected in the compilation of "/tmp/tmpxft_00007f72_00000000-4_reduce.cpp1.ii".

As already said in nvidia forums the --device-emulation option in nvcc seems to resolve, but this shouldn't be correct in our case because in this way we don't use the CUDA features.

Thanks

Andrea
ReplyDelete
Replies
AnonymousApril 16, 2011 at 3:01 AM
A great post.
Thank you.
Good luck in research.
ReplyDelete
Replies
AnonymousApril 16, 2011 at 5:23 AM
hi!

Thanks for this great post. Ths would help me in my undergraduate thesis.

I have some questions about compiling this in an win7 pc. Will the codes be the same for a win7 pc? Im not really good at linux sorry.

Hope you see this!

ANd thanks again for this!
ReplyDelete
Replies
AnonymousApril 17, 2011 at 4:51 PM
Hi, I have some questions about the gpu gamess. It seems the gamess has been compiled correctly(libqc.a and cuda are linked to the executables in ldd), but when I try to run the gpu gamess, only CPU are enabled since I can't find "GPU Time" in log file. Any idea about that? Thanks!
ReplyDelete
Replies
AnonymousApril 27, 2011 at 8:23 AM
I want to do 2-el integrals and Fock-matrix formation GPU parallel, where computations on GPU are faster tha CPU. I do not know how to write input for it. Can anyone help?
ReplyDelete
Replies
AnonymousMay 20, 2011 at 9:08 PM
Can this setup be run on a cluster?
ReplyDelete
Replies
Jiri SturalaJuly 25, 2011 at 1:30 AM
Hi, thanks for this perfect manual. however i have some questions: first of all, i see CPU time, but i don't see gpu time. is there any option to show gpu time in gamess output? and the second one - could you add some post how to use some gpu supported math library like ATLAS?
ReplyDelete
Replies
UnknownJuly 26, 2011 at 11:12 AM
Hi!

I don't know, how this runs on multiple nodes - I don't have access to a cluster with GPU's on every node. But I'm frequently running GAMESS across nodes, and that works extremely well.
Also, I would not dare trying this on a Windows system... dual booting Ubuntu or something will probably save you considerable time and headaches, when installing GPU enabled GAMESS.

There is currently no "GPU time" option. So it's not possible to see, how much GPU utilization your calculations are getting. So be careful to check, how much (or if) the GPU actually makes things faster. :)

I have not tried GPU supported ATLAS (because I didn't know they existed!), but it sounds very interesting. I will definitely look into that, but probably not post a guide anytime soon.

Thanks for the comments!
Anders
ReplyDelete
Replies
AnonymousNovember 10, 2011 at 1:18 PM
Hi, Anders Christensen!
A great post.
Thank you.

http://cse.spsu.edu/clo/research/GPU/gpu_tutorial.pdf
(atlas, cblas)
Good luck in research.
ReplyDelete
Replies
Sidney Ramos SantanaDecember 28, 2011 at 8:12 PM
Dear Anders Christensen and Gamess Users,

I am trying to install Gamess on GPU in a computer with:

S.O.: Linux - Ubuntu 11.04
Video Cards: 2 [GTX580 3GB/card]
Kernel version: 2.6.38-12 (x86_64)
Processor: i7 X 990 @ 3.47GHz
Intel Compilers Version: 12.0.5

I am also installed :

Cuda toolkit + SDK Version: 4.1 rc2 (release candidate)
Math Library: Atlas (from Ubuntu Repository)
Boost Version: 1.48.0
Cheetah Version: 2.4.4
HDF5 Version: 1.8.8
MPICH2 Version: 1.4.1p1

PS: All the addons above were installed using intel compiler.

The compilation procedure is running very well without any kind of
errors.

But, the linking procedure is yielding the following error just in the
rhfuhf subroutine:

=======================================
Choices for some optional plug-in codes are
Using qmmm.o, Tinker/SIMOMM code is not linked.
Using vbdum.o, neither VB program is linked.
Using neostb.o, Nuclear Electron Orbital code is not linked.

message passing libraries are
../ddi/libddi.a -L/opt/mpich2_hydra/lib -lmpich -lmpl -lrt -lpthread

other libraries to be searched are
-L/usr/lib64/atlas -lblas -latlas

Linker messages (if any) follow...
rhfuhf.o: In function `rhfcl_':
rhfuhf.f:(.text+0x45ed): undefined reference to `molecule_new_gamess_'
rhfuhf.f:(.text+0x4604): undefined reference to `basis_new_gamess_'
rhfuhf.f:(.text+0x461b): undefined reference to `basis_sort_'
rhfuhf.f:(.text+0x4697): undefined reference to `hf_fock_'
rhfuhf.o: In function `cchem_hf_fock_':
rhfuhf.f:(.text+0x11c04): undefined reference to
`molecule_new_gamess_'
rhfuhf.f:(.text+0x11c2d): undefined reference to `basis_new_gamess_'
rhfuhf.f:(.text+0x11c56): undefined reference to `basis_sort_'
rhfuhf.f:(.text+0x11cab): undefined reference to `hf_fock_'

Unfortunately, there was an error while linking GAMESS.
=======================================
Please take a look in my install.info file bellow:

#!/bin/csh
# compilation configuration for GAMESS.GPU
# generated on langmuir
# generated at Sex Set 9 21:46:38 BRT 2011
setenv GMS_PATH /opt/gamess.gpu
# machine type
setenv GMS_TARGET linux64
# FORTRAN compiler setup
setenv GMS_FORTRAN ifort
setenv GMS_IFORT_VERNO 12
# mathematical library setup
setenv GMS_MATHLIB atlas
setenv GMS_MATHLIB_PATH /usr/lib64/atlas
# parallel message passing model setup
setenv GMS_DDI_COMM mpi
setenv GMS_MPI_PATH /opt/mpich2_hydra
setenv GMS_MPI_LIB mpich2

=====================================

Please could anyone help me ?

Happy new year.

Best regards,

Sidney R. Santana
ReplyDelete
Replies
UnknownDecember 28, 2011 at 8:19 PM
Why are you using MPI? I may be mistaken, but I don't think that'll work. Can you try again using the GNU compilers?
ReplyDelete
Replies
RubenMarch 21, 2012 at 8:03 PM
When I try to link, I have the following error:

message passing libraries are
../ddi/libddi.a -lpthread

other libraries to be searched are
-L/usr/lib64/atlas -lf77blas -latlas

Linker messages (if any) follow...
/usr/bin/ld: /usr/local/lib/libcchem.a(evaluate.o): undefined reference to symbol 'cblas_ddot'
/usr/bin/ld: note: 'cblas_ddot' is defined in DSO /usr/lib64/libcblas.so.3gf so try adding it to the linker command line
/usr/lib64/libcblas.so.3gf: could not read symbols: Invalid operation
collect2: ld returned 1 exit status

Unfortunately, there was an error while linking GAMESS.
0.7u 0.1s 0:00.92 97.8% 0+0k 0+8io 0pf+0w
ReplyDelete
Replies
UnknownMarch 22, 2012 at 3:54 PM
Hey Ruben. Can you give a little more information on your system? I'm not sure why the file /usr/local/lib/libcchem.a is doing.
ReplyDelete
Replies
AnonymousMarch 27, 2012 at 1:29 AM
Hi! Awesome guide.

I began following it, and everything was going perfectly, until I saw there is no longer a libqc folder in the latest GAMESS distro. In addition, there are patch files for Boost (several versions) included, but I was not savvy enough to figure out how to work it out.

Any suggestions?
ReplyDelete
Replies
UnknownMarch 27, 2012 at 2:26 PM
Hi! I haven't been working with GAMESS for a while, so I haven't had a chance to look at the latest GAMESS. Your best bet is to try the google group for GAMESS:

https://groups.google.com/forum/?fromgroups#!forum/gamess

If your question hasn't already been answered there, it's about time the someone post a step-by-step guide! :)

Cheers!
ReplyDelete
Replies
AnonymousApril 7, 2012 at 5:41 AM
I have problems compiling GAMESS on Windows 7 64bit using a PGI v12.3 compiler from Portland Group. I can`t compile libddi.a in ddi file.

This is the compddi.log message showing the error:
pgcc-Fatal-ccompile completed with exit code 1

Unlinking C:\temp\pgcc2a5ySRthMwCXs.s
unset echo
Error compiling: soc_create.o

DDI compilation did not finish correctly, please fix, and try again

rm: No match.
rm: No match.
rm: No match.
rm: No match.
Fri, Apr 06, 2012 10:33:04 PM
0.045u 0.137s 0:01.01 16.8% 0+0k 0+0io 21743pf+0w

Somebody help me please?

My E-mail: arielgonzalez0310@att.net
ReplyDelete
Replies
UnknownApril 7, 2012 at 12:50 PM
Hi ... Unfortunately, I have no experience with Windows machines. Your best bet is to try the google group for GAMESS:

https://groups.google.com/forum/?fromgroups#!forum/gamess
ReplyDelete
Replies
UnknownApril 7, 2012 at 12:50 PM
Hi ... Unfortunately, I have no experience with Windows machines. Your best bet is to try the google group for GAMESS:

https://groups.google.com/forum/?fromgroups#!forum/gamess
ReplyDelete
Replies
AnonymousApril 17, 2012 at 12:01 AM
Hi
I'm trying to compile GAMESS in Scientific Linux 6.2 64bit with a GPU Nvidia Tesla C2075 and when I configure the file libcchem the log file show at the end the following message:

checking H5Cpp.h usability... no
checking H5Cpp.h presence... no
checking for H5Cpp.h... no

configure: error: *** H5Cpp.h not found

I downloaded the file H5Cpp.h on internet but I don't know where to put it.
I have no experience on Linux...
I need help...
ReplyDelete
Replies
Pakiza BegumSeptember 11, 2012 at 5:55 AM
Where to get the manual for SIMOMM?? I am trying all possible search, but can't get it...
ReplyDelete
Replies
Mohamed Imran PredhanekarJuly 30, 2017 at 7:27 PM
Is there any guide for Windows based CUDA for GAMESS
ReplyDelete
Replies

Add comment