Today (8 May, 2013) sees the launch of a new online repository, created to allow researchers to share publications and supporting data more easily, thus facilitating open collaboration. The repository is called Zenodo and it has been designed to help researchers based at institutions of all sizes to share results in a wide variety of formats across all fields of science.
The repository has been created through the European Commission’s OpenAIREplus project and is hosted at CERN, near Geneva, Switzerland. OpenAIREplus was launched at the end of 2011 as a tandem project to OpenAIRE, which is also funded through the European Commission’s 7th Framework Programme (FP7).
Open access pilot scheme takes off
The European Commission (EC) is currently running a pilot scheme, whereby research funded under FP7 in a number of specific fields is required to be made available in open access repositories. OpenAIRE aims to support the implementation of open access in Europe and link research publications back to information about the mechanisms through which the research was funded. “Now, in OpenAIREplus we’ve added data, too,” explains Tim Smith, group leader for collaboration and information services within the CERN IT department. “Not only do we want to know about publications that have come out of EC-funded research, but we now also want to know what datasets have come out of the work and where they are located.” He adds: “Data is currently much worse served than publications… there are very few places out there for researchers to put their datasets.”
The requirement to make publications freely available via open access journals and repositories is set to spread across all fields of research as part of the EC’s next funding program, Horizon 2020, potentially making Zenodo a vital tool for researchers across the continent. “Of course, people also really want to share information with one another; it’s not only a top-down requirement from the European Commission,” says Lars Holm Nielsen, a software engineer based at CERN, who has been working to create the repository. “With Zenodo, we’re really trying to ensure that we cater to the needs of ‘the long tail’,” he adds. “Researchers at large institutions, such as those running around here at CERN using LHC data, are already taken care of, but researchers at smaller institutions don’t necessarily have a place to go to where they can deposit their research and their data.”
Much more than a library
With Zenodo, files of all types can be uploaded and digital object identifiers are assigned to all publicly available uploads, so as to make them citable. Also, while researchers are encouraged to share publications and associated data as openly as possible, flexible licensing options are available, so not everything uploaded has to be shared under ‘creative commons’.
“However, the killer feature of Zenodo is definitely the community collections,” says Nielsen. “Anyone can create one of these collections and then they are able to either accept or reject whatever people try to put in there,” he explains. “Researchers can upload files to Zenodo and there’s minimal validation of what goes in there, but these community collections essentially allow everyone to create and curate the content and this solves the issue of us otherwise having to validate everything that’s uploaded.”
Additionally, through its links to the OpenAIRE and OpenAIREplus projects, Zenodo provides a simple and easy way for researchers to report back to the EC on their work. And, there are plans to extend this through the addition of further funding agencies in the future. Other upcoming features for Zenodo include automatic metadata extraction from uploads and the ability for users to take advantage of a wide range of authentication methods, such as Google, Twitter, ORCID and OpenAIRE.
Florida Estrella is deputy director of the European Middleware Initiative (EMI). she’s using Zenodo as a way of making EMI software documents available in the future. “Zenodo provides us with an excellent opportunity to create and preserve a snapshot of what EMI has produced over three years and make it available to researchers,” she says. “Science has entered the age of open.”
Like all other data uploaded to Zenodo, these software documents will be stored at the CERN Data Centre. “Of course, the amount of disk space Zenodo will require is a drop in the ocean compared to the data produced by the experiments on the LHC,” says Nielsen, who points out that CERN currently stores more than 100 petabytes of data from the Large Hadron Collider, and produces roughly 25 petabytes per year. He adds: “The data storage system at CERN is also reliable and backed up every night; each file has several replicates, so even if a disk goes down, things are still safe and accessible for researchers.”
Smith agrees that hosting the data at CERN makes the repository a highly attractive proposition for researchers: “As well as providing backing and know-how, the fact that the data is in the CERN Computer Centre means that data is not going to be accidentally deleted overnight… and we also know how to run high-demand services over a long time.”
Pilot flying high
“Although Zenodo is being launched by a European project, it’s not locked in sync with the project”, says Smith. “The idea is that Zenodo is created, fed, and nourished during the project’s lifetime, and then — as part of the sustainability plan of the project — we have to work out how it’s going to last. We want to make it clear to people from the very beginning, that this isn’t just a piece of software from a project that’s going to disappear when the project ends.”
“Zenodo is still a pilot, but it’s a pilot you can trust,” says Smith. “It’s being done by an organization you can trust and that’s interested in setting something up for the future.”