Feature - Editing, analyzing, annotating, publishing: TextGrid takes the a, b, c to D-Grid
A multitude of avid users are waiting to be introduced to grid technologies: scholars are semantically encoding and annotating texts, text scientists are researching the genesis of literature, digitization projects all over the globe are venturing to digitally capture our cultural heritage...
Such projects can sometimes span years, working in isolation on a single text, and together they have already accumulated terabytes of data.
Centuries of literature with just a few clicks
TextGrid is one of the early initiatives to make this data available to the wider community.
Part of the German national grid initiative D-Grid, TextGrid establishes a virtual research environment for scholarly texts, synthesizing methods and technologies from the humanities and the grid community.
Using grid technologies, TextGrid is working to create a corpus of raw data inconceivable thus far. In addition to leveraging re-use and preservation of data, this opens up new and dynamic approaches for analyses: scholars can query centuries of literature with just a few clicks, something virtually impossible without grid technology.
Tools, texts, research and resources
Researchers of language and literature have been working with scholarly texts for many decades. The assets and insights accrued over that time are a valuable piece of our cultural heritage, and the methods and traditions used to prepare and analyze these texts are extremely rich.
However, such scholars often work in isolation, building dedicated tools and tailoring them to a specific research question.
Using grid technology, these same scholars can re-use and collaboratively implement their services, thereby enhancing the quality and efficiency of their work.
TextGrid builds on this concept, creating grid-based utilities and a service-oriented architecture that offers its users an entirely new way of implementing and sharing their tools. As part of this TextGrid has already created a semantic service registry and is further working to define interoperability standards in cooperation with international partners.
What do you do with a million books?
Gregory Crane asks “What do you do with a million books?1” A humanities grid for the integration of data and services is the basis for a host of creative answers. Morpho-syntactic text analysis, text mining and sub-symbolic means for text classification are just some of the techniques that can contribute to and benefit from such an environment.
The humanities are huge and diverse, with exciting applications that address a range of users. Alongside TextGrid are large-scale infrastructure initiatives such as DARIAH (Digital Research Infrastructure for the Arts and Humanities,) and CLARIN (Common Language Resources and Technology Infrastructure), as well as grass-roots projects such as Interedition.
The e-humanities are rapidly warming to the potential of grid technology, and while the humanities were not among the earliest grid innovators, they can now equally benefit from and contribute to the grid community. Their experience with metadata and interoperability, preservation and persistent identification, repositories, knowledge management and more may prove invaluable as grids continue to develop, moving from low-level resource management to higher level services.
TextGrid aims to release a beta version during 2008, and will be employed in regular university classes by the end of 2008.
- Andreas Aschenbrenner, Goettingen State and University Library
Gregory Crane, What Do You Do with a Million Books? In: D-Lib Magazine v.12, n.3. March 2006.