Feature - Reaching for sky computing
Sometimes, a single cloud isn’t enough. Sometimes, you need the whole sky.
That’s why a number of researchers are developing tools to federate clouds, an architectural concept dubbed “sky computing” in a paper published in the September/October 2009 issue of IEEE Internet Computing (PDF).
“Sky computing is a tongue in cheek term for what happens when you have multiple clouds,” explained Kate Keahey of the University of Chicago, who co-authored the paper alongside University of Florida researchers Mauricio Tsugawa, Andréa Matsunaga, and José Fortes.
“[In the paper] we talked about standards and cloud markets and various mechanics that might lead to sky computing over multiple clouds, and then that idea was picked up by many projects,” Keahey added.
Among those inspired by the concept was Pierre Riteau, a doctoral student in the computer science department of the Université de Rennes 1.
“I thought it was interesting research,” Riteau said. After a lengthy correspondence, Riteau and Keahey collaborated to explore the concept further.
One cloud isn't enough
Instead of using well-known commercial clouds such as Amazon’s EC2, some research institutions and projects have opted to deploy their own private clouds using open source cloud computing platforms such as Nimbus. Some of these are “open science clouds,” which freely provide computational resources for researchers, following in the footsteps of infrastructures like Open Science Grid and the European Grid Infrastructure.
The problem is that Nimbus Infrastructure-as-a-Service alone cannot create clouds that span more than one physical computer cluster.
“Infrastructure-as-a-Service software is typically designed to leverage the fact that you are running in the protected environment of a local large area network, whereas to combine resources across providers you have to go out to the wild west of the internet,” Keahey explained.
That means that private and open clouds are limited by the size of the cluster on which they are deployed – and those clusters are usually not very large.
“It’s maybe dozens of machines in each site,” said Riteau. “What we wanted to do with our experiment is to go at a larger scale.”
That doesn’t just mean deploying on larger clusters, or multiple yet identical clusters. Many researchers will want to elastically overflow computations to another cloud their collaboration maintains, an open science cloud, or a commercial cloud when their computational jobs spike. A solution compatible with many different cloud architectures is also necessary.
To create a heterogeneous environment where they could test the sky computing concept, Riteau and Keahey used resources from Grid’5000, a French cyberinfrastructure for large scale parallel and distributed computing research established in 2003, and FutureGrid, an American cyberinfrastructure with similar goals currently in the process of launching.
“I’m interested in blazing pathways between those two projects, because we can learn from Grid’5000, and they can collaborate with us,” said Keahey, who is one of the FutureGrid primary investigators.
“Making both of them work together was not that complicated,” Riteau said. “The biggest challenge was being able to make it go to a large scale, because when you switch from about 30 machines to 1000, you have a lot of issues that appear, and for this we had to improve some parts of the system.”
Their system used both Nimbus IaaS and sky computing tools for cloud management, including virtual machine provisioning and contextualization services; ViNe to allow the clouds to communicate with each other, and the open source software Hadoop for fault-tolerant execution of BLAST, a widely used bioinformatics application. ViNe, or Virtual Network, is a program developed by Keahey’s fellow authors on the 2009 paper about sky computing, Fortes and Tsugawa. At present, it is not open source, but someday it will be, said Riteau. These are all held together by scripts Riteau wrote to call each service, which he also plans to release as open source someday.
In fact, some of Riteau’s work has already contributed to the world’s open source code bases.
“A lot of small improvements to make [Nimbus] scale have been integrated, and actually through a lot of these changes I was asked to become a committer on the project,” Riteau said.
At the June Open Grid Forum meeting, Riteau had the opportunity to demonstrate the system to conference attendees. In his demo, he deployed three virtual clusters on Grid’5000 and three virtual clusters on FutureGrid, which he in turn used to run BLAST computations.
“A lot of people were really interested, and also impressed by the scale of the experiment,” he said.
This experiment is only the very beginning for Riteau, Keahey, and their colleagues.
“We are continuing to work on this,” Riteau said. “For example currently we are running experiments to measure exactly performance and the scalability of the system, and we are going to publish these results soon.” They also plan to make the system automatically expand or shrink the virtual clusters depending on the jobs that are running and the deadlines the users select.
Said Keahey, “The fact is that cloud computing created new patterns and we are having to figure out now how to build tools that will take advantage of those patterns.”
—Miriam Boon, iSGTW