The idea of creating a shared online repository that would make all data from publicly funded research available for anyone to investigate and use, sounds like a laudable and ambitious plan. But how exactly would a European open science cloud (EOSC) work in practice? On 28 and 29 November, data experts, policymakers and scientists gathered in Brussels, Belgium, to discuss the way forward. Horizon went along and here are nine things we learned.
- Open science is a fundamental change in the way science is done
So said the EOSC’s lead architect, Donatella Castelli of the National Research Council of Italy (CNR-ISTI). She was speaking at the opening session of the event organised by the EOSCpilot project, which has been set up to support the development of the EOSC. By using digital technologies and new collaborative tools to share data and services, science can become truly open, not only done by professionals but also by amateurs – the so-called citizen scientists. However, changing the way science is done also means changing the legal and policy context in which it operates. One key ingredient of open science is data, which is currently ruled by domestic law, creating a lot of complexity for open science experimenters. The other key ingredient is technology, which means creating rules for the interchange of data and resources across borders and between researchers.
- There is a lot of data out there
We may think that we are swamped by data now, but there is a lot more data to come. While our phones and laptops are measured in megabytes and gigabytes, big data is petabyte-scale. A petabyte is 1024 terabytes or more than a quadrillion bytes, and the European fusion reactor project at ITER will soon be generating 2 petabytes of scientific data per day in the drive to create a new, sustainable power source. The challenge for open data scientists is to not only store so much raw data but also find a way of extracting the nuggets of valuable information. More data across multiple domains means an enormous task ahead for a project that aims to link it all together.
- Science demonstrators are showing where the problems lie
The EOSC is currently in a pilot phase. By creating science demonstrators - mini open science clouds in specific disciplines such as Earth sciences and high-energy physics - and developing experiments, all kinds of issues come to the surface that can then feed into the design phase. It’s 'a bit like a requirements study,' said the EOSCpilot’s leader and Horizon interviewee Dr Juan Bicarregui. The science demonstration projects hint at the potential of open science and highlight the key technical, organisational and policy challenges of building an international, community-wide cloud network to share resources. It's an example of the evolutionary nature of the development of the EOSC, which has been described as a 'learning by doing' exercise.
- The EOSC could be used for everything from cancer research to archaeology
- How to make data FAIR was the question on everybody’s lips
Sharing is FAIR-ing. FAIR data is findable, accessible, interoperable and re-usable, and it guides the handling of sensitive data and the development of metadata. Metadata is the construction glue that holds the data together and to be FAIR, data must be tagged, managed, filed and connected in a consistent manner, now and into the future. The FAIR principle acts as a compass for getting to a working EOSC but it’s a huge challenge. To illustrate the scale of the problem, just one domain (seismology) conducting open science for better earthquake prediction creates a billion pieces of metadata per year - how is this to be managed and made FAIR?
- The world’s biggest data repository will not own any data
It is a misunderstanding to think that all of the data goes into a big pot in the middle - it doesn’t. Ownership always stays at the local level so the EOSC will not own any data. As well as joining up different data infrastructures into one big network and plugging any gaps, the EOSC’s role is to help researchers by providing services. These include large scale computer processing power and a set of rules underpinning open data. The question many delegates had, is how much procedural involvement should the EOSC have? Most people thought a light-touch approach was the best way to ensure scientific engagement and progress.
- New legislation is set to change the way data is handled
The EU’s General Data Protection Regulation (GPDR) is coming in 2018 and it will change the way data is handled in Europe. It will enable a smoother, unified data handling regime in the EU, while giving people back control of their personal data. The new regime governs data protection, data collection and data use, and any company, regardless of location, which wants to trade in the EU will be subject to the law. Companies need to be compliant with the new regulation when it comes into force after a two-year transition period in May 2018, so with 11 chapters and 91 articles, the GPDR has many IT managers preoccupied at present. For the EOSC, it means dealing with one piece of legislation rather than 28, a big advantage for a pan-European project.
- Open science cloud researchers are humans too
If you’ve ever been locked out of your social media accounts or email, you know how frustrating that experience can be. Something as basic as a failed login attempt means many researchers are discouraged at the first hurdle and lack trust in the system. Human factors such as ease-of-use, added value and, critically, identity management so that logging in is as frictionless as possible, are part of the goal of making EOSC human-centric, as Silvana Muscella, chair of the High Level Expert Group that oversees the whole initiative, urged in her closing keynote address. Trust and familiarity are cornerstones of engagement and while it is unlikely to be as easy to use as Facebook, the user experience should not be a barrier to adoption.
- There’s more money on the way
The EU would like to see the European Open Science Cloud become a reality by 2020. In total, around €272m of the Horizon 2020 budget for 2018-2020 will go towards open science. So far, 70 scientific institutions have endorsed the EOSC Declaration about that goal.