Research Report - Turning the microscope inwards: Studying scientific software ecosystems |
||
|
||
Almost every workflow that generates scientific results today involves software: from configuration and control of instruments, to statistical analysis, simulation and visualization. This means that creating and maintaining software is a significant activity in scientific laboratories, including science and engineering virtual organizations. Our research group at Carnegie Mellon University is examining scientific software as an ecosystem, seeking to understand the circumstances in which software is created and shared. The goal of this Open Science Grid project is to identify effective practices and provide input to science funding policy. Towards this end, the OSG/CMU Scientific Software Ecosystem Workshop was held 16-17 February 2010 in Los Angeles at the facilities of the LIGO collaboration. The event was attended by 18 researchers from 12 organizations in the US and Europe, including many OSG VOs. The workshop was funded through a National Science Foundation Office of Cyberinfrastructure VOSS Grant. The participants identified six areas of concern in the software ecosystem: software reuse, sustaining software beyond project funding cycles, managing the tension between creativity and standards, the role of software in scientific reproducibility and concerns about software policies of scientific funding agencies. The research report details these; below we highlight just a few. Some locations in the software ecosystem are occupied by only a single software package (or no package at all), while others (such as job wrappers) are occupied by multiple partial re-implementations. Why is this so? The workshop participants pointed to the tension between simplicity and functionality and how it evolves over time. Initially a new entrant compares their current requirements to existing packages and finds that these packages seem like overkill, so they begin a new simple, tailored implementation in-house. Over time, however, their requirements grow in complexity, especially as they move from prototyping into implementation and scaling. They therefore extend their in-house code, increasing its capabilities and complexity. Eventually their initially simple solution becomes as feature-full and complex as existing ‘heavyweight’ frameworks (although perhaps less well designed). At this point there is an incentive to justify the time spent in part by releasing the new framework for others to use, adding another similar product to an already crowded field. In other areas, needed software is not being written or at least not being made available to others. The participants argued that scientific culture does not provide adequate incentives to encourage researchers to make scientific contributions via software work. While scientific openness and critique is clearly desirable, scientist-developers perceive software release as a commitment to ongoing maintenance and support and these tasks are sometimes difficult to justify within the current academic career landscape. Reproducibility remains a key aim of scientific software work, and the multiple layers of software involved in modern scientific work are complex, especially when multiple underlying computing platforms are involved. Some participants argued that drawing attention and funding to this issue required communicating that this is not just a software issue, but an issue of basic scientific method and experiment design, just as important as understanding other sources of variance – for example, the potential impact of radiation. The participants also discussed the need to move software from inside a project to a sustainable lifetime beyond the end of that project’s funding. They highlighted a need to share stories of success and failure in this regard, so that others seeking to achieve this have resources and experienced developers to turn to. The importance of this was emphasized by the recent announcement of the UK Software Sustainability Institute, funded with GBP4.2 million to pursue just this question; there is no equivalent in the US. Next, our group at CMU plans to work backward from published scientific papers to map all the software used to produce each paper. We hope to describe how the software was produced or selected, and better understand attitudes towards sharing different components. For more information, you can read the workshop discussion papers and results, available online. —James Howison and James Herbsleb, Institute for Software Research at Carnegie Mellon University ACKNOWLEDGEMENT: This material is based upon work supported by the National Science Foundation under Grant No. 0943168. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. |