Computational Science and Data Science

Hans Fangohr

Towards the European Open Science Cloud

Brief report from 2-day meeting on European Open Science Cloud, and subjective summary of current status (from the first European Open Science Cloud Stakeholder forum, 28 and 29 November 2017 in Brussels).

Picture of (real) cloud

The European Open Science Cloud

The European Open Science Cloud (EOSC) is envisaged by the European Commission as an infrastructure that helps realising Open Science by making data and knowledge easily available.

The general idea is to federate existing and yet to appear services (data and computation hosting, authentication, indexing, collaborative tools, data and service catalogues, ...), to help researchers discover, navigate, use and re-use, and combine them. As a starter, important stakeholders of EOSC are service providers such as EGI (academic cloud and grids in Europe), Eudat (data services), OpenAire (open publication services), Geant (network services). The core role of EOSC is to coordinate, foster interoperability, develop glue services, and generally speaking steer the efforts toward the needs of researchers.

A core idea is the FAIR data principle: data needs to be

  • F: Findable
  • A: Accessible
  • I: Interoperable
  • R: Reusable

The interpretation of data is (or at last should be in our reading) quite broad here: it includes meta data and provenance, data models, software, tools and generally knowledge required to make sense of the data.

The EOSCpilot project claimed that "The European Open Science Cloud will change the way we do Science." - that's rather bold, but there certainly is strong potential.

The period up to 2017 has been used for exploration of the EOSC vision, and a EOSC work programme 2018-2020 has been formulated - with a number of calls - to start to transform the vision to reality.

The meeting

The 2-day meeting brought together over 300 people with keen interest in the vision of the European Open Science Cloud (EOSC). The programme was a mixture of presentations from EOSC pilot projects, existing pan-European service provides and projects, and panel sessions where questions and contributions from the audience were invited.

Left: Logo of Square meeting centre in Brussels; Right: plenary of event

One specific service that we discussed is an in-deployment, JupyterHub & Binder service, hosted by EGI, that will give all European researchers access to a wide array of software and data within the comfortable Jupyter notebook environment. Among other things, this service will lower the barrier for sharing live notebooks (e.g. the computation and data analysis logbooks of a scientific paper), bundled together with all the required data and software, hence fostering reproducibility.

The state of the EOSC

Our impression is that - while the vision is clear - the design and realisation of this vision has no clear shape yet: many ideas and components are floating around, some fully or half realised but simultaneously many questions remain unanswered or have not even been asked yet.

This state of affairs is not surprising: the vision is grand and has the potential to disrupt (positively) the way in which research is carried out today, the actors are human beings, scientists, institutes, funding bodies and states all with their own priorities and constraints.

Furthermore, there are real technical challenges in putting this "Cloud" together, and there are also cultural challenges to move more and more research activities towards Open Science. Existing metrics for academics and research institutions do not generally incentivise open science; which makes change of behaviour difficult.

Better Software for Better Research, Open Source and Open Science

The European Open Science Cloud should be very welcome by all who have been practicing or arguing for open source, open (publication) access, open data, and generally open science to improve the quality and effectiveness of research (and thus the societal value of the investment).

Communities we work with, such as the OpenDreamKit project, SageMath, the UK's Software Sustainability Institute, Southampton's Computational Modelling group and the UK's Centre for Doctoral Training in Next Generation Computational Modelling , and the Jupyter ecosystem, have been pushing for such a vision through their work in the past decade.

Hans' new(-ish) place of work - the European X-Ray Free Electron Laser facility - has a data policy in place that makes all data sets recorded at the facility publicly accessible after an embargo period of three years.

All of these steps and contributions have helped to convince the European Commission to aim for Open Science, and to draft the European Open Science Cloud as an infrastructure to enable this. We encourage everybody involved to take pride in their contribution. Going further, we suggest that our communitis should embrace the opportunities of this vision and get involved to influence and shape the EOSC.

In particular in times where in some parts of the world precious scientific data is removed or hidden (rather than making it more accessible), Open Science and the European Open Science Cloud initiative are a breath of fresh air.

Results from Neutron and Photon science EOSC pilot project

Next steps and challenges ahead

No doubt we are facing a period of exploration of EOSC component and policy designs, duplicate implementation of similar functionality, incompatible interfaces and work that - in hindsight - would appear to have been avoidable or be inefficient.

While at first sight a waste of resources, these are typical symptoms of the initial phase of large projects and significant innovations: transportation of goods and people by aircraft has including attempts to fly by attaching feathers to humans, using hot air balloons, and attaching wings to a bicycle as some of the stages of this development. Very little of this would be considered effective now: today, we have somewhat efficient and safe aeroplanes and helicopters, geographically spread airports and a sophisticated system of regulations and flow of money to sustain an international infrastructure for air traffic. This could not have been planned or predicted 100 years ago.

For the EOSC development - staying with the air traffic parable - we are maybe not quite 100 years back - but clearly at an early stage of a project so complex that no single person or institution can fully comprehend and anticipate all issues and intricacies that may emerge. It's likely we will go through multiple iterations and refactoring of whatever first implementation of the EOSC will emerge in the coming years.

We am looking forward to the opportunities of the EOSC and the work towards the realisation (of the first version)!

Hans Fangohr & Nicolas Thiery

See also an extended report at