Computational Biochemistry: 2012

Friday, September 7, 2012

The least restrictive open source license?

I am working on a program to predict protein chemical shifts and predict protein structural features from chemical shifts. I want my work to be available to anyone, for any purpose, free of charge. Would be nice if the work would remain attributed to the respective authors, and finally I don't want to get sued under any circumstances.

Although widely used, the least restrictive open source software license is certainly not the GNU General Public License. I wouldn't mind having my code used for commercial purposes, for instance.

I also considered something along the lines of what is suggested here:

Steal this code and use it for whatever you want. No support or guarantee is provided or implied - use at your own risk. Attribution would be nice but not essential.

However, after searching around a bit, I settled for the "2-clause Simplified BSD License":

Copyright (c) <YEAR>, <OWNER>
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met: 

1. Redistributions of source code must retain the above copyright notice, this
   list of conditions and the following disclaimer. 
2. Redistributions in binary form must reproduce the above copyright notice,
   this list of conditions and the following disclaimer in the documentation
   and/or other materials provided with the distribution. 

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Only question is why the 2nd half is written entirely in CAPS!

Tuesday, August 28, 2012

Displaying disagreements in protein structures

I recently made a post on how to color a protein structure by disagreements between experimental chemical shifts and chemical shifts predicted from the structure.

Here is another way to display these kinds of errors, which I also find quite nice. Atoms radii are scaled proportionally the error in predicted chemical shifts.

Picture from http://dx.doi.org/10.1021/ct200913r (paywall)

In the above example, chemical shifts are determined from quantum chemical calculations on a 23 residue protein structure, and then compared to the corresponding experimental chemical shifts.

It's clear, that calculated chemical shifts from several side chains atom do not reproduce the experimental values as well as the backbone atoms chemical shifts. Definitely useful to identify possible errors in the protein structure

References:

Frank et al. J. Chem. Theory Comput., 2012, 8 (4), pp 1480–1492

http://dx.doi.org/10.1021/ct200913r

Friday, August 24, 2012

Getting started with Phaistos // installation

I got an e-mail from a student named Alex at Boston University, asking if I could help him getting started with Phaistos. First barrier for new users (in this case a student), is to download and install Phaistos on a Linux machine.

First, you have to download the latest and greatest revision of Phaistos from sourceforge. This is done via the following command:

svn checkout https://phaistos.svn.sourceforge.net/svnroot/phaistos/trunk phaistos

This creates a directory named "phaistos" and downloads the source code to taht directory. Now enter the "phaistos" directory, create a directory named "build" and then cd into that dir:

cd phaistos
mkdir build
cd build
cmake ..

cmake prepares the compilation step and finds out where your libraries etc. are located. Note that Phaistos requires Boost installed, version 1.41 or newer.

If you have Boost installed in a non-standard location and cmake doesn't find the Boost libraries, things are slightly more complicated. On my laptop (running CentOS 6.3), I cannot get cmake to consistently find my Boost via the "-DBOOST_ROOT" option as mentioned in the Phaistos manual. Usually using these cmake command flags should work. Note that in the example, I have boost installed in "/opt/boost_1_41_0".

cmake -DBoost_DIR=/opt/boost_1_41_0 -DBoost_INCLUDE_DIR=/opt/boost_1_41_0/include

In this case, I explicitly tell cmake where all Boost libraries are located via these two lines (the second line is "a bit" long, make sure you get everything when you copy/paste!):

export BOOST_ROOT_DIR=/opt/boost_1_41_0 #(i.e. where Boost is installed)
cmake -DBoost_INCLUDE_DIR=$BOOST_ROOT_DIR/include -DBoost_LIBRARY_DIRS=$BOOST_ROOT_DIR/lib -DBoost_PROGRAM_OPTIONS_LIBRARY=$BOOST_ROOT_DIR/lib/libboost_program_options.a -DBoost_SERIALIZATION_LIBRARY=$BOOST_ROOT_DIR/lib/libboost_serialization.a -DBoost_THREAD_LIBRARY=$BOOST_ROOT_DIR/lib/libboost_thread.a -DBoost_INCLUDE_DIR_DEBUG=/home/kasper/boost_build/include -DBoost_LIBRARY_DIRS_DEBUG=$BOOST_ROOT_DIR/lib -DBoost_PROGRAM_OPTIONS_LIBRARY_DEBUG=$BOOST_ROOT_DIR/lib/libboost_program_options.a -DBoost_SERIALIZATION_LIBRARY_DEBUG=$BOOST_ROOT_DIR/lib/libboost_serialization.a -DBoost_THREAD_LIBRARY_DEBUG=$BOOST_ROOT_DIR/lib/libboost_thread.a -DBoost_INCLUDE_DIR_RELEASE=/home/kasper/boost_build/include -DBoost_LIBRARY_DIRS_RELEASE=$BOOST_ROOT_DIR/lib -DBoost_PROGRAM_OPTIONS_LIBRARY_RELEASE=$BOOST_ROOT_DIR/lib/libboost_program_options.a -DBoost_SERIALIZATION_LIBRARY_RELEASE=$BOOST_ROOT_DIR/lib/libboost_serialization.a -DBoost_THREAD_LIBRARY_RELEASE=$BOOST_ROOT_DIR/lib/libboost_thread.a -DBoost_THREAD_LIBRARY=$BOOST_ROOT_DIR/lib/libboost_thread.a -DBoost_REGEX_LIBRARY=$BOOST_ROOT_DIR/lib/libboost_regex.a -DBoost_THREAD_LIBRARY_DEBUG=$BOOST_ROOT_DIR/lib/libboost_thread.a -DBoost_REGEX_LIBRARY_DEBUG=$BOOST_ROOT_DIR/lib/libboost_regex.a -DBoost_THREAD_LIBRARY_RELEASE=$BOOST_ROOT_DIR/lib/libboost_thread.a -DBoost_REGEX_LIBRARY_RELEASE=$BOOST_ROOT_DIR/lib/libboost_regex.a -DBoost_UNIT_TEST_FRAMEWORK_LIBRARY=$BOOST_ROOT_DIR/lib/libboost_unit_test_framework.a -DBoost_UNIT_TEST_FRAMEWORK_LIBRARY_DEBUG=$BOOST_ROOT_DIR/lib/libboost_unit_test_framework.a -DBoost_UNIT_TEST_FRAMEWORK_LIBRARY_RELEASE=$BOOST_ROOT_DIR/lib/libboost_unit_test_framework.a -DBoost_FILESYSTEM_LIBRARY=$BOOST_ROOT_DIR/lib/libboost_filesystem.a -DBoost_FILESYSTEM_LIBRARY_DEBUG=$BOOST_ROOT_DIR/lib/libboost_filesystem.a -DBoost_FILESYSTEM_LIBRARY_RELEASE=$BOOST_ROOT_DIR/lib/libboost_filesystem.a -DBoost_SYSTEM_LIBRARY=$BOOST_ROOT_DIR/lib/libboost_system.a -DBoost_SYSTEM_LIBRARY_DEBUG=$BOOST_ROOT_DIR/lib/libboost_system.a -DBoost_SYSTEM_LIBRARY_RELEASE=$BOOST_ROOT_DIR/lib/libboost_system.a ..

Now you are ready to compile:

make -j4

After a while you'll get a message saying:

Linking CXX executable ../../../bin/phaistos
[100%] Built target phaistos

From the build directory, you can test if Phaistos is working by typing

bin/phaistos --help

(A very long list of options should appear).

To compile the manual, cd into the "phaistos/build" dir ands

make manual_pdf

This compiles the manual and puts it in the "phaistos/build/doc" dir. You can also download the latest manual from sourceforge here.

Do post a comment below, if you have questions or additional comments!

Wednesday, June 13, 2012

Guistos - a PHAISTOS GUI

This is what can be considered the user documentation for Guistos - a PHAISTOS GUI. I will upload the code to here soon.

Guistos is aimed at people who are new to molecular modeling, such as for instance bachelor students. It assumes relatively little knowledge and also limited Linux skills.

I've put some care into setting sane default values and sets of good mixes of Monte Carlo moves.

Developers will probably still prefer to use {$FAVOURITE_EDITOR} and edit your text-based input files by hand.

If you want your PHAISTOS module supported in Guistos, give me a mail at: andersx (a) nano.ku.dk and we can work something out.

General Options

Start from AA/PDB-file:

In order to tell PHAISTOS what the sequence of the protein is, you must supply a .aa-file containing the sequence. You can also choose to have the sequence (but not structure!) read-in from a .pdb-file or .pqr-file which are also PHAISTOS compatible.

Initialize from PDB structure:

If you supply PHAISTOS with a .pdb-file or .pqr-file, you can "cheat" and start your simulation from the supplied structure. This is useful for things as refinement of homology models and simulation of protein dynamics.

Short Title:

The title is a short name, which is applied to output files from the simulation. sample files will be put in a subfolder to the Guistos default output path (usually /home/$USER/folds), and the muninn logfile will also be named with this title.

Iterations (per thread) and Parallel threads:

Iterations is how many Monte Carlo moves are carried out during a simulation. Parallel threads is the number of parallel simulations.

The total number of iterations in your ensemble will be the product of these two values.

Energy Options

Molecular Mechanics Force Field:

PHAISTOS currently supports two molecular mechanics force fields. The OPLS-AA/L force field and the coarse-grained PROFASI force field. PROFASI is much faster than OPLS-AA/L, while OPLS-AA/L is more accurate.

Spectroscopic Data:

Model:

By supplying a model of prediction of spectroscopic data from a given structure, is is possible to model the agreement between a proposed structure and experimental data.

Currently Guistos supports backbone chemical shift data via the CamShift module, and amide proton chemical shift data via the ProCS module (it's pronounced pro-see-es).

Data File:

CamShift needs a set of experimental data in the NMR-STAR format, while ProCS uses the internal .bcs-format.

Monte Carlo Options

Metropolis-Hastings:

This selection is a simulation at a constant temperature which must be supplied by the user. The default value is 300 K.

Muninn:

This type of Monte Carlo is a more advanced simulation, where the temperature is dynamically adjusted throughout the simulation in order to sample more efficiently from the interesting low energy regions of conformation space.

The user must supply an upper and a lower bound of the temperature. Further more, two sampling schemes are supported through the Muninn module, 1/k and multicanonical histogram weights.

Simulated Annealing:

This type of Monte Carlo is almost identical to Metropolis-Hastings, only the temperature is lowered at a constant rate from start to end of the simulation.

Greedy Optimizaion:

This optimization technique will run a standard Monte Carlo simulation, where only structures with a lower energy than the previous structure are accepted each step.

Monte Carlo Move Sets

Backbone Moves:

There are three selections for backbone moves step sizes; "small", "medium" and "large". All three are mixtures of semi-local, CRISP and DBN moves. Choosing "small" is ideal for things such as refinement and optimization, while "medium" is meant for ab inito folding, and "large" is if you want to sample large parts of conformational space quickly.

For the exact mixes, I suggest looking in the source code, as this may change. Note that if the PROFASI force field is selected, CRISP moves are replaced by semi-local moves, since PROFASI in its current implementation does not allow bond angles and lengths to vary (torsion angles are still degrees of freedom)

Side Chain Moves:

The same logic as for backbone moves applies here. Only these moves are mixes of BASILISK, BASILISK-local and sidechain-uniform moves.

The sidechain-uniform moves are designed to escape salt-bridges more easily. The BASILISK type moves are able sample likely side chain angles depending on the backbone angles of the residue.

Compensate for move bias:

The default mixes of backbone and side chain moves do not sample from all parts of conformational space with the same frequency. This is good if you want to find good structures quicker, but not that great if you want to model the average behavior of a protein.

It is possible to compensate for this "move bias" by adding an energy to your samples, depending on how likely they are to be proposed.

Press the "Save Config" button to save your config-file. Start phaistos with your settings by something like:

phaistos --config-file my_folding.config

Supported PHAISTOS versions:

Currently Guistos is only tested with PHAISTOS 1.0rc1 rev 324.

Wednesday, June 6, 2012

Intel Non-Commercial Software Download

It always takes me quite a while to dig my way through Intel's very messy website. To get to the free non-commercial software follow this link:

http://software.intel.com/en-us/articles/non-commercial-software-download/

There are tons of useful things in there, such as ifort, icc, icpc and MKL (Intel's optimized LAPACK and BLAS implementations and more). You have to register, but pretty much everything is free as long as your work is non-commercial.

Another very useful page is the MKL Link Line Advisor:

http://software.intel.com/en-us/articles/intel-mkl-link-line-advisor/

Pretty much invaluable unless you know the flags by heart (sadly, I do!). Happy `ifort -O3`-ing!

Monday, June 4, 2012

Coloring residues by chemical shift errors

I wrote a quick script to replace the b-factor column in my .pdb files with the error in C-alpha chemical shifts (between experimental values and CamShift prediction of that same structure). This was the easy part.

At first PyMol would color the ensemble snapshots with the b-factor values of the first structure. The trick was to load the ensemble object with discrete=1, which loads an individual set b-factors for each structure. The last thing to be adjusted are the minimum and maximum values for spectrum-coloring.

reinitialize
import glob

def load_structure():

  native_pdb_file = "/home/andersx/color/1PGB.pdb"

  for x in glob.glob("/home/andersx/color/1PGB_opls_colorensemble/col_sample_*.pdb"):
    cmd.load(x, "ensemble",discrete=1)

  cmd.load(native_pdb_file, "native")

  cmd.align("ensemble", "native")
  cmd.hide("all")
  cmd.center("all")
  cmd.h_add

  cmd.show("sticks", "ensemble")
  cmd.hide("sticks", "ensemble &! n. n+ca+c+o+h")

  cmd.show("lines", "native")
  cmd.hide("lines", "native &! n. n+ca+c+o+h02")

  cmd.spectrum( "b", "blue_red", minimum=0, maximum=8)

  cmd.color("grey", "native")

  cmd.bg_color("white")
  
  
load_structure()

The final result looks something like this:

Tuesday, April 17, 2012

Getting music from the interwebz (the truely easy way)

Just found these two neat methods which I wish to share with the rest of the world. I wish the mainstream channels (iTunes store, etc.) was this easy. Personally, I put my own MP3 library into my Spotify and wirelessly synch my with iPhone and iPad. No virtual machines needed to synch via iTunes no more. Yay.

Anyways, on to the news here ... The first of these two gems is called GrooveDown. It uses the GrooveShark API and connects to GrooveShark's music library. It's written in Java so it's also platform independent - a plus for us linux people. Just search for any artist or song and add it to the download list. GrooveDown will then connect and download the selected songs in MP3 format. It IS as easy as it sounds.

Download GrooveDown here: http://groovedown.me/

Image courtesy of the GrooveDown FAQ.

The second method is to extract audio from YouTube videos. This is super easy. Hop on to a website called http://www.vidtomp3.com/ and simply paste the URL for the YouTube video. Seconds later you are presented with a download link for an MP3-file containing the audio of the YouTube video. I can't see how this could be done any easier.

Fine print: I'm not responsible if you use the above information to break any copyright rules that might apply, neither am I responsible if you turned a video of a boring talk into an excruciatingly boring MP3-podcast.

Saturday, January 7, 2012

How easy it is to setup an SVN repository! Or how easy backup and version control really is ...

This post serves to spread the use of version control and easy backup in science and everyday life. SVN (an abbreviation of Subversion) is in my opinion the simplest method. Mostly because it is extremely easy to use and setup.

All you need is a server with SSH access and SVN installed for the most basic way to set up and use SVN. Most research clusters will have this. Our research cluster (cleverly named sunray) already has both, and I will use it to demonstrate how I set up an SVN repository for my latest project, which is an API for building peptide fragments and set up NMR calculations and subsequent data analysis.

On 90% of newer Linux distributions one of the two following commands should install everything required for SVN. This is needed on both the server and all clients.

andersx@computer:~# yum install subversion # as root
andersx@computer:~$ sudo apt-get install subversion

Now, let set up the repository on the remote server, from which we can access the repository from. First, create a directory where you want your repository to live. I use a folder called ~/repositories to keep all my repositories in as sub-directories. My project in this example is called fragbuilder:

andersx@sunray:~$ mkdir /home/andersx/repositories/fragbuilder
andersx@sunray:~$ svnadmin create /home/andersx/repositories/fragbuilder

The command "svnadmin create" sets up the repository. This is all you ever need to do on the server. Now you can log out and forget about it.

Now hop on to the client computer. My laptop "awesome" is used in this example. First, we need to get a copy of the newly created repository. This is called "checking out" and abbreviated "co":

andersx@awesome:~$ svn co svn+ssh://sunray/home/andersx/repositories/fragbuilder

This creates a directory named fragbuilder which is under control of SVN. In this folder, put all my project files, I must:

andersx@awesome:~$ cd fragbuilder/
andersx@awesome:~/fragbuilder$ cp -r ../torsome/* . 
# I.e. just copy your project files into the folder

Whenever new files are added to the folder, put them under version control by SVN by adding them to the repository:

andersx@awesome:~/fragbuilder$ svn add *

When you're done with tinkering with your project files and want to send them back to the server use (this is called "to commit" abbreviated "ci") use the following command. A good habit is to add a comment, so you can remember what was changed before your last commit. This is done with the -m "comment" option:

andersx@awesome:~/fragbuilder$ svn ci -m "Created SVN repository"

Next time you want to update your working copy to the latest version from the server, simply update ("up") in the working directory:

andersx@awesome:~/fragbuilder$ svn up

In a perfect world, I commit when I leave my work place, so if my laptop gets stolen, I still have a backup, and I can update what I did at work when I get home.

The last useful command I want to show you today is how to update to a previous version. Let's say you want to revert to revision 233 of your project:

andersx@awesome:~/fragbuilder$ svn update -r 233

This will revert your working directory to how things were under revision 233.

For the most part, all you need to ever type is svn ci -m "comment" or svn up. This is how easy backup and version control is and should be!