Share |

Back to Basics: What is a data grid?

Image courtesy of gerard79.

Most of our readers are familiar with grid computing, a cost-effective way to distribute the computational cost of a high-volume of computations to computers separated by large geographical distances. In this column, Reagan Moore takes us back to basics to explain what data grids are, and how they are different from the grid computing we’ve come to know and love.

Sometimes, storing data on a single server makes sense. It is a simple way to ensure that people always know where to store data, and where to access it. But in more complex situations, this model can break down for a variety of reasons.

Perhaps the rate at which the remotely recorded data is uploading, combined with the rate at which users wish to access and download your data, is more than the server and its network can handle. Or perhaps you want a way to reliably and seamlessly access data owned and stored by multiple institutions without having to worry about how recently that single server was synchronized with the other institutions’ servers.

Groups that need to manage distributed data, stored across multiple locations and multiple types of storage devices, can use data grids to impose a common access method, common naming, common authentication, and common management policies. They can use the data grid to integrate archives with disk caches used for data access, automate enforcement of management policies and administrative functions, build a collaboration environment, and publish data in a digital library.

The original funding for the development of data grids came from the US Defense Advanced Research Projects Agency. An initial application was the creation of a distributed patent digital library for the US Patent and Trade Office. Within the US National Science Foundation’s Partnerships for Advanced Computational Infrastructure (PACI) program, which was launched in March 1997, data grids were built for major national scale research projects in seismology, neuroinformatics, digital library initiatives, astronomy, environmental science, and oceanography.

Since those early days, data grid capabilities have evolved from simple organization of distributed collections to now include enforcement of management policies, automation of administrative functions, and validation of assessment criteria. The new capabilities are enabled through the integration of a distributed rule engine into the data management infrastructure.

Within that infrastructure, each policy is mapped to a computer actionable rule that controls the execution of a data management procedure. At each storage location, a rule engine applies the procedures. The procedures, in turn, are mapped to workflows composed from standard functions, called micro-services. The results of the execution of each procedure are stored as persistent state information in a metadata catalog, tracking the data’s provenance. The state information can be queried to verify assessment criteria.

This policy-based data management approach makes it possible to build generic infrastructure that can support each stage of the data life cycle. The policies and procedures required for simple distributed data management (traditional data grid applications) can be augmented with the policies required for data publication in a digital library or data preservation in an archive. Furthermore, the same data management infrastructure can be used to re-purpose a collection for a new use, by applying the policies required by the new user community.

One example of a policy-based data management system is the open source integrated Rule Oriented Data System (iRODS).

The demonstration of the generic capabilities of policy-based data management systems is being done in international collaborations that include groups in Japan, Taiwan, Australia, Europe, and the United States. In Asia, the iRODS technology has been adopted by the T2K neutrino data grid in Japan and the Taiwan Digital Archives Remote Backup system, and Academia Sinica in Taiwan is developing the gLite-iRODS interoperability.

For more information about iRODS, please visit the website http://irods.diceresearch.org.

Your rating: None Average: 4 (2 votes)

Comments

Once the researchers modeled

Once the researchers modeled the protein interaction, they could better study how the aS-Abeta hybrid complex aggregates on neural cells and produces the pores that cause cell death. Kosmetik Online

ahappydeal

In the FAQ category: unlock android phonethat is locked after too many pattern attempts. I've seen the question so many times in the Google Mobile Help Forum ...

grid

Have you ever considered about including a little bit more than just your pieces? I mean, what you say is important and all. However think of if you added some awesome visuals or video clips to give your posts more, pop! Your content is excellent but with images and videos, this site could undeniably be one of the very best in its niche. Fantastic website!

Thanks for such a nice and

Thanks for such a nice and updated information. I got some interesting tips from this post.
ads dating.

Thanks for the nice blog. It

Thanks for the nice blog. It was very useful for me. Keep sharing such ideas in the future as well. This was actually what I was looking for, and I am glad to came here! Thanks for sharing the such information with us.
donna cerca uomo milano

Groups that need to manage

Groups that need to manage distributed data, stored across multiple locations and multiple types of storage devices, can use data grids to impose a common access method, kultura i sztuka common naming, common authentication, and common management policies. They can use the data grid to integrate archives with disk caches used for data access, automate enforcement of management policies and administrative functions, build a collaboration environment, and publish data in a digital library.

The original funding for the development of data grids came from the US Defense Advanced Research Projects Agency. An initial application was the creation of a distributed patent digital marketing library for the US Patent and Trade Office. Within the US National Science Foundation’s Partnerships for Advanced Computational Infrastructure (PACI) program, which was launched in March 1997, data grids were built internet i komputery for major national scale research projects in seismology, neuroinformatics, digital library initiatives, astronomy, environmental science, and oceanography.

Post new comment

By submitting this form, you accept the Mollom privacy policy.