15.Jun 2023

Transparent and reproducible research with open source software

FOSS (Free and Open Source Software) is an integral part of science. Data analyses are written in Python, image data are analysed with Fiji and files are shared via Nextcloud installations on institute servers. Open knowledge requires open code, or so one would think. However, there are often big differences between the ideal and reality: especially in the natural sciences, code is written and used by scientists, but not always shared publicly under open licences.

At the Prototype Fund, we regularly fund projects related to scientific research. For example, the 13th funding round includes the tool gget, which facilitates combined access to genomic databases, Tauritron, a tool for energy system studies, BrightSkyPlus, a web application for weather data, and Boundary Agents, a visualisation of the history of BIPOC in early modern Europe. We use this diversity as an opportunity to take a closer look at the relationship between FOSS and research.

In research in general and the natural sciences in particular, code is becoming increasingly important. New analytical methods are generating ever larger amounts of data that can only be meaningfully interpreted using software algorithms. Software is thus becoming an elementary part of scientific work, but unlike, for example, a description of the used experimental methods, code does not usually have to be published as part of the results. The decision about publication usually lies exclusively with the researcher who wrote the code. And many decide against it.

“There is a lot of pressure to ‘make the software look nice’ to the utmost before a release, for fear that the code will be criticised or not used, but in my experience this is very rarely the case,” says Simon Danisch, developer of Makie, a data visualisation ecosystem for the Julia programming language, which was funded by the Prototype Fund in round 7. Simon Danisch does not work directly in academia himself, nevertheless he is particularly interested in software development in this field. “Scientific applications are often more fascinating than in other fields, in my opinion, and the chance to work with intelligent people who are not driven by money is higher than outside academia, which I find very positive.”

Open source software is one way out of the reproducibility crisis

For science, too, publishing code under free licences is a benefit that goes beyond ethical ideals. “Publishing code enables other researchers to reproduce their results, which is essential for the scientific process,” Stefanie Lück says on the matter. As a Python developer, she has been working on free software for around 15 years and has developed, among other things, the software BluVision, which can be used to analyse the interaction of plants and pathogens.

The reproducibility of experiments is a persistent issue in the research landscape. In biology in particular, researchers repeatedly struggle to reproduce and thus verify published results. There are many reasons for this, often the complex biology of living systems makes the exact replication of experiments difficult. There are also areas, however, such as the analysis of genomic data, where results could be easily verified if all parts of the analysis were freely published. For transparent and reproducible research, it is essential that both data sets and the code for their analysis are freely published. This strengthens the entire process of scientific discovery.

Simply publishing code in a repository is not enough, however. Functional code needs to be preserved and maintained. “A big problem is that not all researchers publish code, and even if they do, maintenance and updates are usually done by individuals,” says Dr Elisabeth Kugler. She is a biologist and develops open source software for analysing microscopy data.

Software needs support

Maintaining the written software can be challenging, but publishing it is worthwhile nonetheless, says Stefanie Lück: “In terms of publishing open source scientific software, the thought of potential support requests should not necessarily be an obstacle. It is important to plan accordingly by estimating the support effort and allocating resources accordingly. However, the benefits of publishing scientific FOSS, such as increased transparency, reproducibility and collaboration, may outweigh the potential support effort required.” However, she often sees funding as a problem in the scientific sector. “The development and publication of open source scientific software is not always adequately funded. In many cases, the development is driven by researchers who are passionate about their work and willing to dedicate their time and resources to the project.”

Dr Kugler also sees the problem of lack of funding: “Unfortunately, time and funds are very limited, which means that often only the ‘bare essentials’ can be done, such as producing code to answer a specific question – without documentation, data, help or support.”

The development and publication of open source scientific software is not always adequately funded.

Stefanie Lück

There are, however, structures within the research landscape that specifically promote open source software. The Journal of Open Source Software enables scientists not only to publish their code under a free licence, but also to link it to a citable publication. And CERN in Geneva has built the Zenodo open access repository, a database in which open source software is tagged with a digital object identifier (DOI). This enables other scientists to reference code and correctly cite the work of others – and citations are of central importance in scientific careers.

While funding open source software through the usual academic grant schemes can be difficult – even though institutions such as the Federal Institute for Materials Research and Testing repeatedly emphasise its importance – there are also other funding options. For scientists, funding programmes such as the Prototype Fund can also be a way to finance their development work. Often, the tools that researchers develop for their work are also helpful and valuable to other people in society. Scientific code under free licences is public interest tech.

The Prototype Fund supports community-oriented software prototypes with up to €47,500 for 6 months. The funding programme also offers coaching on various topics and many opportunities to network with other software developers. The next application phase starts on 1 August 2023. You can find all the information at prototypefund.de/en/apply/.

Transparent and reproducible research with open source software