The Earth sciences, like geology, oceanography, and astronomy, generate vast quantities of data. Yet without the right tools, scientists either drown in this sea of big Earth data or the data sits in an archive, barely used.
The vision of the EarthServer project is to offer researchers ‘big Earth data at your fingertips’, so that they can access and manipulate enormous data sets with just a few mouse clicks.
“The project was the result of a ‘push’ and a ‘pull’,” says project coordinator Peter Baumann, professor of computer science at Jacobs University in Bremen, Germany. “On the demand side there was a need for new concepts to handle the wave of data crashing down on us. On the supply side we had a data cube technology that is well-suited to this domain.” A data cube is a three- (or higher) dimensional array of values, commonly used to describe a time series of image data.
EarthServer built advanced data cubes and custom web portals to make it possible for researchers to extract and visualize earth sciences data as 3D cubes, 2D maps, or 1D diagrams. The British Geological Survey, for example, used EarthServer technology to drill down through different layers of the Earth in 3D.
“For the user, data cubes hide the unnecessary complexity of the data,” says Baumann. “As a user, I don’t want to see a million files: I want to see a few data cubes.”
Data in the Earth sciences often takes the form of sensor recordings, images, simulation outputs, and statistical measurements — each often with an associated time dimension. The data items typically form regular or irregular grid values with space/time coordinates. EarthServer makes these arrays available as data cubes.
Aside from ease-of-use, the data cubes also make it possible to integrate data from different disciplines, and scientists can combine measurement data with data generated from simulations.
To handle big Earth data efficiently, EarthServer needed to extend existing technologies and standards. The SQL database query language, for example, is more oriented towards the manipulation of alphanumeric data.
To enable data cubes, the project was built upon rasdaman, a new type of database management system specialized in multi-dimensional gridded data, called rasters or arrays. Rasdaman enables the flexible, fast extraction of data from big Earth data arrays of any size.
“Essentially, we have married the SQL database language with image processing,” says Baumann. “This is now becoming part of the ISO SQL standard.”
EarthServer’s researchers also developed a ‘semantic parallelization’ technology that sub-divides a single database query into multiple sub-queries. These are sent to other database servers for processing.
This method enables EarthServer to distribute a single incoming query over more than 1,000 cloud nodes and rapidly answer queries on hundreds of terabytes of data in less than a second.
EarthServer-1, which ran from September 2011 for 36 months and received €4 million (~ $4.4 million) in EU funding, involved a range of multinational partners. Building on the success of the first phase of the project, EarthServer successfully applied for funding from the European Commission to support its next phase, EarthServer-2.
This kicked off in May 2015 and will focus on the ‘data cube’ paradigm, as well as on handling even higher data volumes. “The plan is to focus on the fusion of data from different domains and to be able to resolve a query on a petabyte within a second,” says Baumann. “That would mean that a user could view the data on screen and manipulate it interactively.” EarthServer-2 is now working on the next frontier, open-source 4D visualization.
This article is reproduced from the European Commission website (© European Union, 1995-2014). It has been edited to conform with the iSGTW style guide, and is also available in French, Spanish, German, Polish, and Italian.
--iSGTW is becoming the Science Node. Watch for our new branding and website this September.