SnowFlock: Parallel Cloud Computing Made Agile

H. Andres Lagar-Cavilla

Intel/SDI/LCS Seminar - Intel Pittsburgh and Carnegie Mellon University, June 2008



Cloud computing represents an excellent opportunity for the execution of emerging scientific and engineering high performance applications. In particular, bioinformatics workloads which are characterized by their access to large datasets and their embarrassing parallelism are specially amenable for this model. Many researchers would like to test new algorithms or complete their jobs with the maximum possible number of processors, something they cannot always obtain easily or quickly. Encapsulating their applications in VMs and submitting them to shared compute clusters allows them to get hold of vast computing resources beyond their usual reach, without the need to learn new software tools or even rebuild their applications. Further, access to large data sets (genomes, phylogenetic) that are too unwieldy or dynamic to cache locally can be simplified by co-locating them with a compute cluster. By virtue of their embarrassingly parallel quality, many bioinformatics applications are able to shrink their completion times to the order of seconds, providing quasi-interactive response times to external requests arriving via e.g. a web server interface. Achieving such speedups demands an agility in spawning new computing elements that compute clusters based on virtualization currently lack. Today, when a cluster user requests new VMs, they are provided by booting a copy of her VM from scratch, resuming from a saved state on secondary storage, or live-migrating an idle VM from a host where it was being consolidated. These primitives do not scale gracefully and will typically take longer than the processing of a single worker thread. In this talk I will present a new primitive, VM cloning, that is able to replicate a running VM to a large number of hosts in sub-second time, and with a runtime overhead that is not noticeable for most workloads. This allows applications in a shared virtualized cluster to scale in an agile manner, and to shrink runtimes of easily parallelizable jobs to mere seconds. While conceived from within a bioinformatics scope, this work applies to many other fields with large-scale parallel applicationss: financial, rendering, search, etc.