VBIO
Aktuelles aus den Biowissenschaften

The Future of Bioscience Databases – First the Small Ones, and Then…?

Over the past 25 years, we have all grown accustomed to having almost unrestricted access to information. The term “googling” has been in the German Duden dictionary since 2004. More recently, the launch of generative artificial intelligence marked yet another information revolution. Almost overnight, platforms such as ChatGPT and Perplexity fundamentally changed the way we access knowledge – and “AI” was instantly adopted worldwide. After all, it’s free – except for the vast amount of personal data it consumes.

But what would happen if googling and the now ubiquitous AI tools suddenly came at a price? If browsing the internet was only possible for a fee? Such a scenario is currently unfolding in the realm of specialised scientific databases. Developments surrounding VEuPathDB, a database widely used in infection biology, serve as just one example.

Markus Engstler, recently elected President of the VBIO, comments:

The natural sciences rely on access to a wide array of specialised data collections. These data have, in most cases, been generated experimentally over many years, carefully curated, and meticulously archived in publicly accessible databases. This work was often done behind the scenes, with few researchers aware of who maintained the databases.

Examples of such indispensable resources include the major biological databases: collections of genetic and biomolecular data, built up since the dawn of molecular biology over 50 years ago and accessible via institutions such as the US National Center for Biotechnology Information (NCBI). The well-known Protein Data Bank holds all available structural data on proteins. It is only thanks to the datasets curated there over decades that ground-breaking AI systems like AlphaFold could be trained – a scientific achievement that was recently honoured with the Nobel Prize.

Another vital component of scientific infrastructure is literature databases like PubMed or Medline. Since their early days – when they were (indeed, for a fee) distributed via physical data storage – these repositories have undergone enormous technological progress. Today, they provide real-time access to an immense body of literature, including links to original datasets and metadata.

Who runs these repositories? Interestingly, the United States has, for decades, taken a leading role in developing and maintaining such databases – investing vast sums to do so. Many of the world’s most important scientific, particularly biological, databases are hosted by US institutions.

However, the will of the USA to continue funding is not unlimited. While there hasn’t yet been a total collapse, the first significant cutbacks are already being felt.

A very current example: the important database VEuPathDB, which had been funded by the US National Institutes of Health and used by researchers working on eukaryotic pathogens such as protozoa and fungi, suddenly lost its financial support in 2023. David Roos, its founder and long-time curator – himself a highly decorated basic researcher – then appealed directly to scientists and funding bodies around the world.

The outcome: just a few weeks ago, a “voluntary payment model” was introduced to secure the short-term survival of the database. This translates into costs of up to several thousand euros per year for an average-sized laboratory. But where is this money supposed to come from? At best, institutional research budgets are stagnant.

The core issue: In order to ensure sustainable financing, funding agencies worldwide must rethink their strategies and better safeguard access to scientific databases. This is not yet happening at a sufficient level.

What’s also becoming increasingly clear is that essential databases – especially the seemingly small ones – must be organised in a more redundant and decentralised manner. 

It must not be possible for political influence to trigger a collapse in scientific research.

Recent months have demonstrated just how quickly political frameworks can shift – with potentially massive consequences for the support of scientific infrastructure. We must factor this in going forward.

Database operators have always collected data on user access, often enabling tracking back to the requesting laboratory. Therefore, we must prevent these databases from falling under the control of totalitarian states, simply because those regimes are willing to pay.

Today, it is above all European countries and the European Union that are called upon to step up their support. Only then can free access to all scientific data for all researchers be preserved.

Hence, the appeal must be: All scientists must become aware of how fragile open data access truly is.

Funding bodies must also be prepared to support the “small” databases and allocate resources to guarantee free access.

What is needed is a coordinated, international effort in which major players – such as the USA and the European Union – work together with other partners to ensure the long-term preservation of our ever-growing scientific data repositories.

The required funding is minimal by comparison – but it must be invested now. Because once a database is shut down, its restoration becomes extremely costly – if it can be restored at all.

And there’s another, perhaps even more crucial, reason: If a scientific field loses its specialised databases, its productivity is immediately and drastically reduced. This very quickly leads to the departure of young researchers seeking promising and productive disciplines. A field without functioning data infrastructure loses its appeal and will, in the medium term, disappear.

This scenario is no longer hypothetical:

If molecular eukaryotic infection biologists and parasitologists don’t pay for access to their data, VEuPathDB, will be shut down – in the summer of 2025.

And the temperature is rising…


Would you like to get involved?

Then write to: praesident@vbio.de