 |
|
Gaia will be the most accurate optical astronomy satellite ever built to date. (Click on image to enlarge.) Image courtesy ESA - C. Carreau
|
“Only one line of code was changed in order to run it on the cloud.”
The system works as follows: Working nodes get a job description from the database, retrieve the data, process it and send the results to intermediate servers. These intermediate servers run dedicated algorithms and update the data for the following iteration. The process continues until the data converges. The nature of the AGIS process makes it a good candidate to take advantage of cloud computing because:
* The amount of data increases over the 5-year mission. * Iterative processing results in 6-month Data Reduction Cycles. * At current estimates, AGIS will run for 2 weeks every 6 months.
In order to port to the cloud, Amazon Machine Images (AMIs) were configured for the Oracle database, the grid and the AGIS software. The result is an Oracle grid running inside an Amazon cloud; a full relational database rather than a cloud database service. Five cloud storage volumes of 100 gigabytes each were attached to the database virtual machine (VM). Another VM was configured with the grid and the AGIS software. Only 1 line of code was changed in order to run it on the cloud.
To process 5 years of data for 2 million stars, 24 iterations of 100 minutes each were done, which translates into 40 hours of running a grid of 20 Amazon Elastic Compute Cloud (EC2) high-CPU instances.
For the full billion-star project, 100 million primary stars will be analyzed, plus 6 years of data, which will require a total of 16,200 hours on a 20-node EC2 cluster. The estimated cost calculated for the cloud-based solution is less than half the cost of an in-house solution, even when the additional electricity and system administration costs of the in-house solution are not taken into account.
A second test was done by running 120 High CPU Extra Large Virtual Machines (VMs). Each VM was running 12 threads, so there were 1440 processes working in parallel. Performance problems associated with SQL queries and lock contention at the database were detected and resolved, which could not have been found with the current cluster. Thus, the cloud allowed us to find and solve performance and scalability problems before going to production.
Using cloud computing we can scale up to massive capacities (both processing and storage) in a matter of minutes without having to invest in new infrastructure, train new personnel or license new software. We can have a peak load capacity without incurring the higher costs of building larger data centers and maintaining the servers and networks.
—Alfonso Olias - The Server Labs, at the European Space Agency (ESA).
Want to comment about the idea of a grid in a cloud? See our iSGTW forum on Nature Networks. Their FAQ tells how to make a post.
|