Compute Cluster: DSC00179 © Jordan Thevenow-Harrison / CC I few months ago I was trying to track down information about Amazon's Elastic Compute Cloud, pay-as-you-go virtual clustering service. At the time Declan Butler had emailed me asking about the feasibility of running bioinformatics applications on EC2. My investigation of EC2 for bioinformatics applications turned up very little at the time, today however Andrew Perry has posted an analysis on the feasibility of EC2 for running mpiBLAST. If you're into bioinformatics clusters (of course) then go read it right away, if you've considered a cluster and balked at the expense then this may be a solution.
The down side is, Amazon's limited-beta for EC2 is now full. Hey Amazon, Bioinformatics is a growing market, might pay to help Andrew out with some account space ?
As an aside, I have tried to make this post just a little bit more interesting by adding an image of a happy looking guy next to a cluster. I'm not much of a photographer, so I used a by attribution, share-alike Creative Commons licensed image I found from Flickr's Creative Commons Search. I think I'm doing the right thing here, the attribution is provided under the image (those should be links but the img_assist module is escaping them, will fix soon). And clicking on the image takes you back to the original at Flickr. So if you, image owner, or someone else finds there is something wrong with the way I have used this image please let me know.


Comments
mpiBLAST in EC2
I've got mpiBLAST working well inside EC2. If you'd like to try it out, you will find this document helpful.
http://mpiblast.pbwiki.com/AmazonEC2
I will be at BioIT World at the end of April, and happy to discuss this topic with others.
Mike Cariaso * Bioinformatics Software * http://www.cariaso.com
mpiblast and EC2
I'm an mpiBLAST developer and a heavy mpiBLAST user. I'm also moderately active with EC2. Despite a lot of interest in merging these two interests together, I haven't yet found an opportunity. If anyone reading this is considering such a project I'd be interested in comparing notes or collaborating.
As for MPI on EC2, like most parallel apps its a matter of the right topology for the right application. Communication between nodes is going to have fairly high latency compared to the more traditional dedicated clusters, but for some apps thats perfectly acceptable. The benefits of free transfers into and out of S3 is potentially a big win.
I would like to imagine that we could someday host a shared version of the major blastable DBs in S3, to alleviate the maintenance and transfer costs. For the moment, its probably inappropriate since its more designed for total replacement than the incremental growth.
--
Mike Cariaso * Bioinformatics Software * http://www.cariaso.com
Collaboration
I still don't have a EC2 account, but it looks like all the heavy lifting will have been done by the time I get a chance to play with EC2 ... check out Peter Skomoroch's extensive tutorial "On-Demand MPI Cluster with Python and EC2 (part 1 of 3)", if you haven't caught it already.
Andrew Perry -- http://pansapiens.blogspot.com/
That tutorial is quite
That tutorial is quite full-on, a bit too much for me at the moment. I thought I remember someone offering to transfer their beta EC2 account to you in the comments of your post ?
EC2 MPI quickstart
Check out the second part of the tutorial I just posted on Data Wrangling, it is a bit less lengthy and should let you get a cluster running in a few minutes using the Amazon EC2 public image published based on the first post.
There are some Python scripts available on my blog to configure the MPI cluster on EC2, and it looks like you can hack them a bit to configure an EC2 BLAST cluster based on my tutorial:
MPI Cluster with Python and Amazon EC2 (part 2 of 3)
For message intensive code I expect performance will be a bit worse than people with research clusters are used to considering the effective 250 Mb/s interconnect. I'll be doing some benchmarking in coming weeks. The advantage of EC2 will be for startups and people who otherwise can't get access to or afford to build a permanent cluster. As EC2 moves out of Beta, I would guess that there may be some high performance options if Amazon finds enough demand for it.
-Pete
EC2
I've heard the Amazon folk talk about EC2 a bit and also attended a workshop during Mindcamp. From what I can gather, S3 would be perfect for a small startup biotech. Scales well, cheap etc etc. EC2 might work for a company that does computation in bursts, especially using apps like BLAST, but for a company that is constantly crunching data, I am not yet convinced EC2 is the right solution over something like Sun's on-demand offering, which is more suited for number crunching apps (as noted in the article, I would not run MPI apps on EC2 based on current knowledge).
My Blog: http://mndoci.com