The Research Data Alliance (RDA)seeks to build the social and technical bridges that enable open sharing and reuse of data, so as to address cross-border and cross-disciplinary challenges faced by researchers. This September, the RDA will be hosting its Fourth Plenary Meetingin Amsterdam, The Netherlands. Ahead of the event, iSGTW speaks to Gary Berg-Cross, general secretary of the Spatial Ontology Community of Practiceand a member of the US advisory committee for RDA. Berg-Cross is also co-chair of the RDA working group on data foundation and terminology. His fellow co-chairs are Peter Wittenburg of the Max Planck Institute for Psycholinguistics in Nijmegen, The Netherlandsand Raphael Ritz of the Computing Center of the Max Planck Society in Garching, Germany.
How do working groups such as your own enable RDA to achieve its goals?
The working and interest groups are really the heart of the Research Data Alliance and their results will be key to its success. Interest groups are essentially about problem areas people are encountering and working groups are about finding solutions related to particular topics within an 18-month period. The work done by these groups reflects the overall goal of the RDA, which is to achieve open access to and reuse of research data without barriers.
By the time of the plenary meeting in Amsterdam, the first RDA working groups will have been around for roughly 18 months. As such, several of the initial working groups — including the one I co-chair — will be ready to present their products. So, it’s an exciting time and a next step for the RDA.
So, can you tell the iSGTW readers a little bit more about your working group then?
Simply put, the goals of our working group on data foundations and terminology are to aid socio-technical communication within the RDA, between its various working and interest groups, and to stimulate harmonization of basic concepts in the data science domain. We want to facilitate conversation by providing specific definitions of key data management and infrastructure terms and concepts used across different disciplines. Simple misunderstandings about what terms mean can dramatically slow the progress of groups in sharing and reuse of data, as well as hinder adoption of products. As such, we’re developing a common conceptual model made up of these defined terms that we can all coordinate around. This will support conversations between groups, will provide sound practice for the RDA community, and will establish a reference model for broad discussions about how to organize data.
What were the driving factors behind your decision to establish this working group?
It’s vital for sharing and reuse that we are able to properly define the concepts that are emerging in the areas of data science and cyberinfrastructure. There’s some existing work in this area that’s come out of the EUDAT data federation project: they and federation projects such as DataONE are confronted with the huge heterogeneity of data organizations and have started developing some useful initial conceptualizations to simplify data integration. These have fed into the work we’ve been doing through the RDA.
Why, in your view, is it actually important that researchers share data with one another?
Sharing and reuse of data is something that’s been talked about for a very long time, but modern scientific methods mean that there are some very exciting opportunities today. We’re now in the era of big data, which means that there’s lots of very collaborative, data-intensive ‘big science’ going on. This has been spoken about as a new paradigm for science, in which hypothesis-driven science is supplemented with discovery methods working on data. Consequently, the value of data today, especially across disciplines, is greater than ever before — this is true for science, engineering, education, and maybe even commerce, too.
Data sharing is also vital for tackling big issues like climate change that draw on research from a range of fields. Equally, it’s important that data is shared and data organizations are harmonized to help train and empower new generations of researchers, to ensure that data-driven research is reproducible, and to protect against scientific misconduct. Also, another good reason is that you don’t want data that is expensive to produce to be regenerated unnecessarily. Basically, there are just so many reasons why data sharing is important!
As an attendee of the RDA Fourth Plenary Meeting, what are you looking forward to most at the event?
The event is a tremendously important opportunity for finding out about the community’s progress and the status of the products being worked on. It’s a wonderful, diverse, and hardworking community that’s full of great people and ideas. And there are always fascinating keynote speakers at the events, too. As the RDA events move around the world, one gets a fresh, local perspective on the various issues the RDA is tackling.
Additionally, some of the people I work with are discussing a new interest group on data ‘fabrics’ at the event. This will be all about having an integrated vision of how to generate data in a structured and self-documenting way that is compliant with the basic data model we are working on. The goal is to look at how essential components, such as data repositories, registries, and automatic policies can be specified in a more consistent way, so as to facilitate a wide variety of connection patterns. I think it’s going to be very exciting!
The RDA Fourth Plenary Meeting will take place from 22 to 24 September in Amsterdam, The Netherlands. Find out more about the event here.