Computational Biochemistry: Chemistry on the Amazon EC2

Friday, February 14, 2014

Chemistry on the Amazon EC2

We are trying out the Amazon EC2 compute cloud for running computations in the Jensen Group. This is a note on how things are going so far.

It was actually extremely easy to set up. Within minutes of having created the Amazon Web Services (AWS) account, I had a free instance of Ubuntu 12.04.3 LTS up and running and was able to SSH into it
You have access to one free virtual box and 750 free hours per month for the first year so it is free to get started. My free instance had some Intel processor, 0.5GB RAM and 8GB disk space (I think the spec change from time to time).

I copied binaries for PHAISTOS (the program we are looking to run) over and they ran successfully, and things pretty much went without a hitch.
After trying out the free instance, I just saved the image (you can do that via the web interface), and every other instance I just started from the same image so no configuration was needed after the first time.
I mounted a folder located on the university server via SSHFS which I use to store output data from the instance directly to our server. This way I don't lose data if the instance is terminated, and I don't have to log in to the instance to check output or log-files.

The biggest problem for me was the vast number of different types of instance. You can select everything form memory-optimized to CPU, storage, interconnect or GPU instances, and these come in several different types each. This takes a bit of research and there is a lot of fine print. E.g. Amazon doesn't specifiy the physical core count, but rather "vCPU" which may or may not include hyperthreading (i.e. the vCPU number may be twice what you actually get!)
Also the price varies depending where the data center where you spawn your instances is located. I picked N. Virginia data center which was the cheapest. I don't know why I would pick one of their other data centers? The closest to me is located in Ireland, but it is about 15% more expensive. Asia seems to be even more expensive.

Managing payment is also surprisingly easy. I had my own free account which I used in the beginning. +Jan Jensen created an account using the university billing account number. From there we used the Consolidated Billing option to add my account to having the bill sent to Jan's account.

Our current project is pretty much only CPU-intensive and barely requires any storage or memory, so naturally I had to benchmark the instance types that are CPU optimized.

I tested out the largest (by CPU count) instances I could launch in the General Purpose (m3 tier), Compute Optimized (c3 tier) and Compute Optimized//previous generation (c1 tier). These are the m3.2xlarge, c3.2xlarge and c1.xlarge instances.

In short these machines are:

name = core count (processor type) ~ hourly price (geographical location of server)

m3.2xlarge = 4 physical cores (Intel E5-2670 @ 2.60 GHz) ~ \$0.90/hour (N. Virginia)
c3.2xlarge = 4 physical cores (E5-2680v2 @ 2.80 GHz) ~ \$0.60/hour (N. Virginia)
c1.xlarge = 8 physical cores* (E5-2650 @ 2.00 GHz) ~ \$0.58/hour (N. Virginia)

The c1.xlarge didn't support hyper threading from what I could gather. The m3.2xlarge is more expensive, because it has faster disks and more RAM. Initially, I thought the m3.2xlarge had 8 physical cores, but turns out I was merely fooled by the "vCPU" number and several pages of fine print in the pricing list.

As a test, I launched a Metropolis-Hastings simulation in PHAISTOS starting from the native structure of Protein G with the PROFASI force field at 300K with the same seed (666) in all the tests, and noted the iteration speed as a function of cores.

The maximum number of total iterations (all threads, collectively) per day for the three instances was comparable (see below) maxing out at around 500-600 millions/day.

A slight win for the quad core c3.2xlarge instance when it is hyperthreading on 8 cores.

No real benefit to spawn more than 8 concurrent threads either.

What is probably more important is the throughput for each USD you spend. Again, the c3.2xlarge wins (when hyperthreading on 8 cores) and is the cheapest for our purpose.

18 comments:

AdrianFebruary 14, 2014 at 4:50 PM
Hi Jan

Not fully official yet, but AWS is going to proved an Amber-gpu instance that anyone in the world can run (and pay for). Easy and very fast...
ReplyDelete
Replies
gyg3sApril 29, 2014 at 9:13 AM
Hi Anders

Presumably the same could be done for GAMESS (US)?

(The only problem I would anticipate is the fortran part of the process).
ReplyDelete
Replies
hellogetmyblogbackApril 29, 2014 at 9:49 AM
For pretty much any program (also GAMESS) you would do the exact same. Just compile the program as you would do on any other platform, and run it just like your linux box.
ReplyDelete
Replies
gyg3sApril 29, 2014 at 3:32 PM
Thanks for that, Anders. Will look into it.
ReplyDelete
Replies
UnknownNovember 14, 2016 at 12:51 PM
Thanks for this great post! - This provides good insight. You might also be interested to know more about generating more leads and getting the right intelligence to engage prospects.
Techno Data Group implements new lead gen ideas and strategies for generating more leads and targeting the right leads and accounts.
Amazon AWS Users Email & Mailing List
ReplyDelete
Replies
gunamJanuary 3, 2017 at 1:56 PM
Actually it is extremely easy to set up. Within minutes of having created the Amazon Web Services (AWS) account through cheap custom British essay writing service at greatessay.biz. This is a note on how things are going so far. This project is pretty much only CPU intensive and barely requires any storage or memory.
ReplyDelete
Replies
NandhiniAugust 29, 2017 at 7:19 AM
I have completely read your post and the content is crisp and clear. Thank you for posting such an informative article, I have decided to follow your blog so that I can myself updated. Amazon Web Services Training in Chennai
ReplyDelete
Replies
UnknownSeptember 18, 2017 at 5:40 PM
Thanks for the nice blog. It was very useful for me. I'm happy I found this blog. Thank you for sharing with us,I too always learn something new from your post.
Amazon
ReplyDelete
Replies
AnonymousOctober 31, 2018 at 7:09 AM
Amazon has a simple web services interface that you can use to store and retrieve any amount of data, at any time, from anywhere on the web. Amazon Web Services (AWS) is a secure cloud services platform, offering compute power, database storage, content delivery and other functionality to help businesses scale and grow.For more information visit.
aws online training
aws training in hyderabad
amazon web services(AWS) online training
amazon web services(AWS) training online
ReplyDelete
Replies
Dharani MDecember 11, 2018 at 2:13 PM
Great information
tableau course in Marathahalli

best tableau training in Marathahalli

tableau training in Marathahalli

tableau training in Marathahalli

tableau certification in Marathahalli

tableau training institutes in Marathahalli

ReplyDelete
Replies
Ram NiwasMay 1, 2019 at 9:03 AM
Thanks for sharing the information,Looking forward for new posts.

AWS Technical Essentials Training
ReplyDelete
Replies
Test My internet SpeedJuly 15, 2019 at 11:06 AM
Thanks for sharing such a great blog Keep posting..
AWS Training in Delhi
AWS Training institute in Delhi

ReplyDelete
Replies
aliyaSeptember 16, 2019 at 11:19 AM
Thank you for providing such an awesome article and it is very useful blog for others to read.
AWS Training Institute in delhi
ReplyDelete
Replies
latesttechnologyblogsNovember 27, 2019 at 12:16 PM
Usually I never comment on blogs but your article is so convincing that I never stop myself to say something about it. You’re doing a great job Man learn AWS Online Training Hyderabad
ReplyDelete
Replies
AnonymousJune 15, 2020 at 11:42 AM
python training in bangalore | python online training
aws training in Bangalore | aws online training
artificial intelligence training in bangalore | artificial intelligence online training
machine learning training in bangalore | machine learning online training
data science training in bangalore | data science online training

ReplyDelete
Replies
Karen SandersFebruary 22, 2021 at 10:54 AM
I like your post. It is good to see you verbalize from the heart and clarity on this important subject can be easily observed...
aws course
ReplyDelete
Replies
high technologies solutionsMay 27, 2021 at 6:42 AM
Big thank you for for sharing this post it's the content i looking for if anyone looking AutoCAD training institute in delhi Contact Here-+91-9311002620 Or Visit Website- https://www.htsindia.com/AutoCAD-training-courses
ReplyDelete
Replies
UNKNOWNSeptember 22, 2021 at 9:35 PM
Really I enjoy your site with effective and useful information. אשרת תייר בקנדה
ReplyDelete
Replies

Add comment