Thursday, December 30, 2010

HOWTO: RM1 in Gaussian09

In my group we regularly use semi-empirical QM methods when we need to minimize large structures. Lately we've had some good results with Jimmy Stewart's PM6 method (he also go by the, IMO, cool name of Mr. MOPAC). Recently, the 25 year old AM1 model of Dewar et al. has been reparametrized, by the group of Mr. MOPAC, into the RM1 method. Supposedly RM1 beat the [euphemism] out of older parameterizations, such as AM1, PM3 and PM5.
In short: Right now the big players in the field of semi-empirical methods seem to be the well-known PM6 and newcomer RM1. The approximations in the two are very alike with the most notable difference between the two, probably being the introduction of d-orbitals on main group elements in PM6, which are not present in AM1 and it's derivative reparametrizations.

I'm not aware of any program which does RM1 in parallel out-of-the-box. If you do know of any open source or free program capable of this, do leave a comment here! I know that Gaussian 09 does PM6 and AM1 and will let you run on multiple cores too. The down side is that I don't have a clue of how much speed up you get from running semi-empirical calculations in parallel in Gaussian09. From my experience, I've seen good scaling with parallel DFT calculations in Gaussian, but I've never really tested the semi-empirical methods in parallel this way. Step one is of course to get RM1 up and running in Gaussian. This proved to be a near impossible task.

Here I present how to make Gaussian09 do RM1 calculations. This is something I've been trying to figure out for about a week now, and solely using the information found on the RM1 website combined with the sub-par Gaussian09 manual, I've had little luck, until I contacted Mahmoud A. A. Ibrahim, who kindly supplied me with a g09 input-file. These things should be really simple, but when the documentation is erroneous and incomplete, there is not much to do other than old fashioned trial-and-error and guesswork. However, since I felt that my time was more valuable than this,  I more or less gave up until I five minutes later got a reply. So big thanks to Ibrahim for having sorted this out! Below is an outline of  how it is done. I hope the Gaussian-staff will include this in the manual, because it is by no means straightforward to see how and where to get the parameters and the correct names and units. For instance, the GCore parameters are not mentioned in the manual, so it can be hard to specify these correctly, and I believe that some of the entries in the MOPAC to Gaussian parameter table are erroneously paired up.

============ Begin file ============
# opt AM1=(Input,Print,NoGenerate)

Title

0 1
 C                  5.52110000    2.02080000   -0.14150000
... rest of molecule ...
 H                  2.51520000   -7.31660000    0.02560000

 Method=8 CoreType=1
 ****
 H       
 PQN=1 NValence=1 F0ss=0.5138736446 ZetaOverlap=1.0826737000 U=-0.4395468110
 Beta=-0.2118762033 CoreKO=0.9730018200 KON=0,0,0,0.9730018200 KON=1,0,1,1.0589197790
 KON=0,1,1,0.9730018200 KON=2,1,1,1.0589197790 EISol=-0.4395468110 EHeat=0.0830298228
 Alpha=1.623706039
 GCore=0.0071452321,1.6526506619,2.2204505049
 GCore=0.0044844511,1.7971829010,3.6631297957
 GCore=-0.0024774154,0.7854047496,3.0926358381
 ****
 C       
 PQN=2,2 NValence=4 F0ss=0.4796935160 F0sp=0.4165460293 F0pp=0.3723813969 F2pp=0.1879094681
 G1sp=0.1711215396
 ZetaOverlap=1.8501880000,1.7683009000
 U=-1.9008794468,-1.4481913012
 Beta=-0.5681197391,-0.3026706191 DDN=0,1,0.7967571300 DDN=1,1,0.6926111205
 CoreKO=1.0423321900 KON=0,0,0,1.0423321900 KON=1,0,1,0.9808908700 KON=0,1,1,1.0423321900
 KON=2,1,1,0.7558858400 EISol=-4.3315453857 EHeat=0.2723305520 Alpha=1.477897228
 DipHyp=1.5934225837
 GCore=0.0051822600,1.6071441515,1.9728170130
 GCore=0.0008174160,1.9389223038,3.1399608166
 GCore=0.0025838555,1.7534236006,3.0832529699
 GCore=-0.0001879630,2.5202671079,5.2828786928
 ****
 N       
 PQN=2,2 NValence=5 F0ss=0.4809517358 F0sp=0.4855419470 F0pp=0.4603627461 F2pp=0.2692199995
 G1sp=0.5512408181
 ZetaOverlap=2.3744716000,1.9781257000
 U=-2.6037351707,-2.1306270015
 Beta=-0.7670041923,-0.6126744081 DDN=0,1,0.6495619700 DDN=1,1,0.6191441047
 CoreKO=1.0396053400 KON=0,0,0,1.0396053400 KON=1,0,1,0.5151439200 KON=0,1,1,1.0396053400
 KON=2,1,1,0.6231658500 EISol=-7.5368325074 EHeat=0.1800769640 Alpha=1.568600643
 DipHyp=1.2990492003
 GCore=0.0042177292,1.2850311275,2.6054387408
 GCore=0.0016934863,1.2957774179,3.9376355712
 GCore=-0.0015857545,0.5748275884,3.5293247133
 ****
 O       
 PQN=2,2 NValence=6 F0ss=0.5145797792 F0sp=0.5496321127 F0pp=0.4844989583 F2pp=0.2207863333
 G1sp=0.4335139609
 ZetaOverlap=3.1793691000,2.5536191000
 U=-3.5628280133,-2.8624391247
 Beta=-1.0970045571,-1.0712800661 DDN=0,1,0.4886700600 DDN=1,1,0.4796114166
 CoreKO=0.9716666100 KON=0,0,0,0.9716666100 KON=1,0,1,0.4967410600 KON=0,1,1,0.9716666100
 KON=2,1,1,0.5584555600 EISol=-11.4672725099 EHeat=0.0949133088 Alpha=2.207710126
 DipHyp=0.9772838928
 GCore=0.0160375838,1.4612692876,1.7076238079
 GCore=0.0040694547,2.0804240743,2.8677465230
 ****
 F       
 PQN=2,2 NValence=7 F0ss=0.6144822801 F0sp=0.6159711092 F0pp=0.5507178433 F2pp=0.0551275865
 G1sp=0.2202381595
 ZetaOverlap=4.4033791000,2.6484156000
 U=-4.9311603036,-3.9632901345
 Beta=-2.5724529652,-1.2009616000 DDN=0,1,0.3488927400 DDN=1,1,0.4624443630
 CoreKO=0.8136931000 KON=0,0,0,0.8136931000 KON=1,0,1,0.5419872400 KON=0,1,1,0.8136931000
 KON=2,1,1,0.8017335300 EISol=-17.8085651426 EHeat=0.0301031314 Alpha=3.175063812
 DipHyp=0.6977453358
 GCore=0.0279882125,2.0174429443,1.5430182683
 GCore=0.0049208369,2.5202610313,2.7174711546
 ****
 P       
 PQN=3,3 NValence=5 F0ss=0.4072043031 F0sp=0.2088608220 F0pp=0.2745110810 F2pp=0.0308577654
 G1sp=0.1280880722
 ZetaOverlap=2.1224012000,1.7432795000,1.0000000000
 U=-1.5366852349,-1.2635676846
 Beta=-0.2254626127,-0.2184534727 DDN=0,1,1.0106954700 DDN=1,1,0.9598690581
 CoreKO=1.2278848400 KON=0,0,0,1.2278848400 KON=1,0,1,1.2715722400 KON=0,1,1,1.2278848400
 KON=2,1,1,1.6008006600 EISol=-4.5267737772 EHeat=0.1204284617 Alpha=1.010693038
 DipHyp=2.0212746476
 GCore=-0.0285170033,1.7046815287,2.4878293672
 GCore=-0.0113192311,1.9867256080,3.6041106250
 GCore=-0.0033939241,2.5201987249,5.0239839450
 ****
 S       
 PQN=3,3 NValence=6 F0ss=0.4589360160 F0sp=0.3149088537 F0pp=0.2922830364 F2pp=0.1308243369
 G1sp=0.4288413981
 ZetaOverlap=2.1334431000,1.8746065000,1.0000000000
 U=-2.0273776403,-1.7099205406
 Beta=-0.0719958680,-0.3224498447 DDN=0,1,0.9936920700 DDN=1,1,0.8926246952
 CoreKO=1.0894764700 KON=0,0,0,1.0894764700 KON=1,0,1,0.7214224200 KON=0,1,1,1.0894764700
 KON=2,1,1,1.0081994600 EISol=-6.8128155113 EHeat=0.1058151363 Alpha=1.291275251
 DipHyp=1.9872698041
 GCore=-0.0518075719,1.3470435829,1.1221218344
 GCore=-0.0045273966,2.0183359552,2.4470443530
 GCore=-0.0004555529,2.5202571669,3.4026437095
 ****
 Cl      
 PQN=3,3 NValence=7 F0ss=0.5644781272 F0sp=0.4890126782 F0pp=0.3906816863 F2pp=0.4442159843
 G1sp=0.1945765429
 ZetaOverlap=3.8649107000,1.8959314000,1.0000000000
 U=-4.3538053708,-2.8059323918
 Beta=-0.7322047420,-0.4236959083 DDN=0,1,0.4541788400 DDN=1,1,0.8825847052
 CoreKO=0.8857739000 KON=0,0,0,0.8857739000 KON=1,0,1,0.6675936500 KON=0,1,1,0.8857739000
 KON=2,1,1,0.6487473400 EISol=-14.0555179538 EHeat=0.0461985061 Alpha=1.954562896
 DipHyp=0.9083054214
 GCore=0.0089912708,0.8337132813,2.7731689426
 GCore=0.0002006300,1.9877196813,4.7243667328
 ****
 Br      
 PQN=4,4 NValence=7 F0ss=0.6289878820 F0sp=0.5741785343 F0pp=0.3485867906 F2pp=0.2870889303
 G1sp=0.2464182944
 ZetaOverlap=5.7315721000,2.0314758000,1.0000000000
 U=-4.1704597746,-2.7998282113
 Beta=-0.0492954863,-0.3014275181 DDN=0,1,0.2099005300 DDN=1,1,1.0442262465
 CoreKO=0.7949278600 KON=0,0,0,0.7949278600 KON=1,0,1,0.3797804400 KON=0,1,1,0.7949278600
 KON=2,1,1,0.8527529400 EISol=-13.1237877769 EHeat=0.0426129028 Alpha=1.517206895
 DipHyp=0.4197769085
 GCore=0.0685363742,1.1998779275,3.7798245418
 GCore=-0.0643982928,1.2713460218,3.8100223654
 ****
 I       
 PQN=5,5 NValence=7 F0ss=0.7349770009 F0sp=0.2825867563 F0pp=0.2574091259 F2pp=0.0690025699
 G1sp=0.1561143756
 ZetaOverlap=2.5300375000,2.3173868000,1.0000000000
 U=-2.7525236785,-1.8892915650
 Beta=-0.1540958564,-0.1617111472 DDN=0,1,1.2963425100 DDN=1,1,1.1085963335
 CoreKO=0.6802933800 KON=0,0,0,0.6802933800 KON=1,0,1,1.3450521300 KON=0,1,1,0.6802933800
 KON=2,1,1,1.4234289100 EISol=-9.1319620411 EHeat=0.0406639282 Alpha=1.133270597
 DipHyp=2.5925358606
 GCore=-0.0056582787,0.4370267028,3.7794911941
 GCore=0.0041077335,1.6132758519,4.1666344737
 ****


============= End file =============

Do remember, that Gaussian is ultra picky about blank lines, to be sure to have exactly one blank line between sections and your file must be terminated with two or more blank lines.

A short explanation is in order. The method is specified as AM1, because AM1 and RM1 use identical formulas. The Input keyword tells AM1 to read the parameters from the input stream (i.e. the stuff in the bottom part of the file). I'm including the Print and NoGenerate keywords as a safety check. Print tells Gaussian to print the used parameters in the output file. This way we're sure, that our optimization used the right keywords. NoGenerate tells Gaussian to not use the native AM1 parameters if a parameter is missing in the input stream. This way we're likely to get some sort of failure in the calculation if our parameters are not read in correctly.
For those who worry about too much clutter in their input files, it is of course safe to leave out the parameters for elements which are not present in the molecule.


EDIT: To honor the fact, that this website promotes free software, let me point out, that the RM1 method is public domain and the parameters are fully disclosed. It is also included in the free quantum chemistry programs GAMESS and MOPAC. GAMESS is free, while MOPAC is free for academic or non-profit purposes.



Saturday, November 13, 2010

GotoBLAS2: A faster BLAS library

This post is about a little trick I have for speeding up my quantum chemistry calculations. For some reason this software is unknown to most people, and thus I decided do a short advertisement!

The bottleneck in all forms of computational quantum chemistry is the CPU time spent to converge a calculation to a given accuracy. The most expensive part is in most cases the evaluation of two-electron integrals,  but during many calculations, a lot of time is also spent doing linear algebra operations. The computational routines for doing linear algebra has been optimized greatly over the years and today there are quite a few standard linear algebra libraries for doing this. One standard API of linear algebra routines is called BLAS, short for "Basic Linear Algebra Subprograms". Many equations involved in quantum chemistry are formulated into matrix notation which is easily interpreted and turned into fast, parallelized code.

When people want to use a fast BLAS library, often well-known libraries such as the ALTAS or Intel MKL BLAS libraries are used. However, there is a slightly faster BLAS library out there, named GotoBLAS2. This library was started by a Japanese guy named Kazushige Goto (the Japanese pronunciation is something lik "go-toe"), who, like Albert Einstein, worked in a patent office while he began developing his own BLAS library. After a while he was 'discovered' by a university in Texas and soon Goto was deported from Japan. Now he is said to be very rich and working for some large coporation somewhere in the US.

Enough with the history! Tobias Wittwer has made a detailed comparison of common, fast BLAS libraries in this pdf: blas_lapack.pdf


You can download the latest GotoBLAS2 here. Not only is GotoBLAS2 faster  than Intel's MKL library, GotoBLAS2 is also completely free of charge and open source under the BSD license.

GotoBLAS2 will be used in future posts on compiling and installing software, for which I am always using GotoBLAS2. So be sure to use this and make the most of your CPU resources!

Friday, November 12, 2010

Installing OpenBabel and compiling with the C++ API

OpenBabel is a tool making life easier for everyone in the computational chemistry business. You can do tons of cool stuff with OpenBabel! I regularly use it to convert PDB files to input files for Gaussian09 and XYZ files, which are then used to calculate quantum chemistry on proteins and often back to PDB  format again. It can also juggle around with SMILES strings, do all sorts of chem-informatics and whatnot. Everything you and I need.  OpenBabel is the King for these kinds of things. 
In addition OpenBabel has API to Python, Pearl and C++, so you can use all the features of OpenBabel in your own scripts or compiled programs!  I started using it for a project requiring automatic generation of thousands protein fragment XYZ files and automatic handling of input and output files. Many of my colleagues use OpenBabel for stuff like this.
Not only is this an absolutely awesome piece of software, it is also completely free and open source under the GNU GPL. I love this! This post concerns basic installation and use of the C++ API, which is not completely straightforward in some cases.








For the some users, installing and compiling with OpenBabel on a Linux machine can be a daunting task. Recently I helped my dear colleague Kasper Thofte setting up OpenBabel and compiling a piece of C++ code on our local Beowulf Cluster.  Kasper is currently developing some very cool  automatic transition state search methods and he is also developing transition search methods in GAMESS, which is free quantum chemistry software
Since we don't have root access on our cluster, everything has to be installed in our home directories and we cannot use the /usr/lib etc. folders.  These are all the steps we went through to get his C++ code to compile.

First I checked out the latest development code from the SourceForge into my ~/src/ dir. Compiled code goes in the ~/programs/ dir on my account.

cd ~/src
svn co https://openbabel.svn.sourceforge.net/svnroot/openbabel/openbabel/trunk openbabel



Step two is to actually compile the code with cmake and telling it to install to my ~/programs/ dir with the a prefix option.
 
cd openbabel
cmake -DCMAKE_INSTALL_PREFIX=/home/andersx/programs/openbabel .

make install

Now, to make sure that you can access
OpenBabel from the command line as well as having the g++ compilier find the libraries, you have to find a file in your home ~/ directory called .bashrc and insert these two lines:


export PATH=$PATH:/home/andersx/programs/openbabel/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/andersx/programs/openbabel/lib


Now open a new terminal (to refresh the PATH and LD_LIBRARY_PATH variables) and you are ready to compile your OpenBabel code. There is one small catch however. You have to manually tell g++ where your libraries and headers for OpenBabel are located. This looks like this:


g++ -I/home/andersx/programs/openbabel/include/openbabel-2.0 -L/home/andersx/programs/openbabel/lib -lopenbabel OBtest.cpp -o OBtest


There you go! The above took us quite a while to figure out, and this should also work on any type of linux machine with the GNU compilers.

UPDATE: My friend Casper Steinmann just pointed out that he has a detailed blog post on installing and using the Python API for OpenBabel. So you should have no excuses for not using OpenBabel in your workflow: http://www.caspersteinmann.dk/?q=node/4