Technology - Taming the data deluge: the new open source iRODS data grid system
The digital data we all love is growing explosively. In 2006, humanity produced 161 exabytes of digital data—that’s 161 billion billion bytes.
This deluge of data is bringing with it unprecedented challenges in organizing and accessing digital information. To meet these challenges, the Data-Intensive Computing Environments group at the San Diego Supercomputer Center has released version 1.0 of iRODS, the Integrated Rule-Oriented Data System, a powerful new open-source approach to managing digital data.
“iRODS is an innovative data grid system that incorporates and moves beyond ten years of experience in developing the widely used Storage Resource Broker technology,” said Reagan Moore, director of the DICE group at SDSC.
“iRODS equips users to handle the full range of distributed data management needs, from extracting descriptive metadata and managing their data to moving it efficiently, sharing data securely with collaborators, publishing it in digital libraries, and finally archiving data for long-term preservation.”
What’s in a name?
The most powerful new feature, for which the Integrated Rule-Oriented Data System is named, is an innovative “rule engine” that lets users easily accomplish complex data management tasks. Users can automate enforcement, or “virtualize” data management policies by applying rules that control the execution of all data access and manipulation operations. Rather than having to hard code these actions or workflows into the software, the user-friendly rules let any group easily customize the iRODS system for their specific data management needs.
RODS is also designed to be flexible, growing seamlessly from small to very large needs.
“You can start using it as a single user who only needs to manage a small stand-alone data collection,” said Arcot Rajasekar, who leads the iRODS development team.
“The same system lets you grow into a very large federated collaborative system that can span dozens of sites around the world, with hundreds or thousands of users and numerous data collections containing millions of files and petabytes of data.”
At SDSC alone iRODS and its predecessor SRB technology are already managing one petabyte of data and two hundred million files for 5000 users.
Currently the iRODS team is working with partners to help a number of projects apply the technology, including National Archives and Records Administration, the Ocean Observatories Initiative, the National Science Digital Library, the Temporal Dynamics of Learning Center, the UC Humanities, Arts and Social Sciences grid and the Testbed for the Redlining Archives of California’s Exclusionary Spaces project, and numerous others.
Get it started
To help users get started with iRODS, the DICE group is offering several tutorials and workshops in the US and internationally. Following on the very popular Society of American Archivists workshop at SDSC last summer, there will be two SAA sessions this summer, with additional tutorials in the US, Europe, and Asia.
iRODS is funded by the U.S. National Archives and Records Administration and the National Science Foundation. Other collaborators include the French Institut National de Physique Nucléaire et de Physique des Particules (IN2P3), the UK e-Science Data Management Group at Rutherford Appleton Laboratory, and the High Energy Accelerator Research Organization, KEK, in Japan.
- Paul Tooby, San Diego Supercomputer Center