Share |

Feature - CLARIN: A project that speaks to you

CLARIN: A project that speaks to you

Wee-Ta-Ra-Sha-Ro, Head Chief of the Wichita. Painted by George Catlin in 1834. Image courtesy

The creation story of the Wichita people tells of a creator, “Man-never-known-on-Earth,” who formed the world, land, water and the first man and woman: “Man-with-the-Power-to-Carry-Light” and “Bright-Shining-Woman.” This couple brought to the Earth light, corn-growing, deer-hunting, game-playing and prayer, before becoming the morning star and the moon.

While the story itself is preserved in literature for antiquity (e.g., in George Dorsey’s 1904 book The Mythology of the Wichita), fewer than 10 people today can tell the story in the Wichita language, nearly all of whom are elders living on tribal lands in Oklahoma, USA.

It’s a pattern repeated around the world; many languages are endangered or dying. Preserving these languages is vital for groups seeking to revitalize and maintain their culture.

Linguists have been recording and documenting endangered languages for as long as there has been recording equipment, or about 120 years. What has been lacking — until now — is a central place to search and access these data stores, which are scattered around the world. To remedy this, the CLARIN project is studying and preparing to provide comprehensive language research and preservation tools.

CLARIN, or Common Language Resources and Technology Infrastructure, began preparing its infrastructure in 2008. At the end of 2010, it expects to move into the construction phase. Its goal is seamless access to language archives and applications; by doing so, CLARIN hopes to become an invaluable tool for helping to document and understand our languages — and therefore understand ourselves.

The newest edition of UNESCO’s Atlas of the World’s Languages in Danger totes up 6,000 world languages — and counts 2,500 as endangered and 200 as lost. The interactive atlas ranks the 2,500 endangered languages by five levels of vitality: unsafe, definitely endangered, severely endangered, critically endangered and extinct. Image courtesy UNESCO

An advantage to all

Many sectors of society will benefit, say CLARIN’s creators.

For instance, an educator or government official reviewing educational policy could search stored archives of childrens’ recordings in her country. Using this information, she could then compare indicators of linguistic sophistication — breadth of vocabulary for example — among children of the same age from different regions in her country, or perhaps compare the language skills of boys and girls within the same age group.

Similarly, a historian researching a given politician could determine the frequency with which he used a certain word or phrase in a given month, year or decade. This kind of data could illuminate the germination of a political idea or movement.

Or a dictionary writer could clarify and expand a word’s meaning based upon the syntax and phrases commonly associated with that entry.

And a teacher seeking to expand his students’ horizons could show them language systems radically different from their own. One example of the latter is Kuuk Thaayorre, spoken by aboriginal people of Far North Queensland, Australia — a language which contains no word for left and right. Directions (north, south, east and west) do the job instead. Consequently, its speakers have a heightened spatial awareness, states linguistic researcher Lera Boroditsky of Stanford University, in an article in the website Edge:

“ . . . you have to say things like ‘There's an ant on your southeast leg’ or ‘Move the cup to the north-northwest a little bit.’ One obvious consequence of speaking such a language is that you have to stay oriented at all times, or else you cannot speak properly. The normal greeting in Kuuk Thaayorre is ‘Where are you going?’ and the answer should be something like ‘South- southeast, in the middle distance. . . ’ ”

Most likely you and I, in the absence of a compass, wouldn’t be able to get past “Hello.”

Unusual Challenges

To create such a repository means overcoming a variety of challenges. “The needs of our users — as well as the needs of our sources — present some interesting problems,” says Martin Wynne, a member of CLARIN. For example, patient confidentiality must be preserved, and intellectual property rights respected. Consequently, sign-on to the CLARIN infrastructure will offer differing levels of access, with data from medical patients or children restricted, and recorded songs might be offered by only for academics, and not to commercial musicians.

More unusually, some data must be removed once the source dies.

The reason?

Upon the death of a Pitjantjatjara-speaking Aborigine in central Australia (near Uluru, or “Ayers Rock”), for example, anything associated with that person — such as photographs or recordings — temporarily becomes taboo for prolonged mourning periods lasting months or even years. Even the person’s name is not spoken, instead the phrase “Kuminjay” is substituted, in what anthropologists term “avoidance language.”

As a result, “We’ll have an ethical obligation to (temporarily) cut access to recordings of that person,” says CLARIN’S Peter Wittenburg.

Like a jigsaw puzzle

Besides the ethical considerations, the team needs to make sure that sources drawn upon by the CLARIN catalogue are reliable and persistent. A PhD student using CLARIN as a source for his thesis needs to trust that cited resources remain in place.

Wynn, Wittenburg and Daan Broeder of CLARIN recently visited the CERN IT department to observe how the Worldwide LHC Computing Grid and Enabling Grids for E-sciencE had approached security, monitoring and the provision of highly-available services.

“We are at the stage of designing the architecture,” says Broeder. “It is like a jigsaw puzzle: some pieces are already defined and in place. We are now looking for the missing pieces. To the extent we can we’d like to find preformed puzzle pieces that would be a good fit to save us from making and cutting our own.”

—Danielle Venton, EGEE

From UNESCO’s Atlas of the World’s Languages in Danger:

It is impossible to estimate the total number of languages that have disappeared over human history. Linguists have calculated the numbers of extinct languages for certain regions, such as, for instance, Europe and Asia Minor (75 languages) or the United States (115 languages lost in the last five centuries, of some 280 spoken at the time of Columbus). Some examples of recently extinct languages are:
•    Manx (Isle of Man) — 1974, with the death of Ned Maddrell
•    Aasax (Tanzania) — 1976
•    Ubyh (Turkey) — 1992, with the death of Tefvic Esenc
•    Eyak (United States, Alaska) — 2008, with the death of Marie Smith Jones

No votes yet


Post new comment

By submitting this form, you accept the Mollom privacy policy.