Many times in the past we have stressed the importance of open data in complying with the strategies proposed by the European Union for achieving the objectives set for 2020. Today we are focusing on the work of open data for research within Europe, and more specifically, on the ICTS of the Doñana Biological Reserve, an ecosystem and biodiversity research infrastructure headquartered in Seville.
The importance of open data is beyond doubt and is an essential tool for development and progress in general. In particular, within the field of research, great advances are being made thanks to Open policies, leading to greater synergies in a shorter period of time.
Before entering into matter, it is necessary to know Briefly, what is the context in which these applications take place? and the infrastructures and initiatives that have an important role within the this circuit.
EGI – European Gride Infraestructure
EGI is the European Mesh Infrastructure, in reference to the grid computing techniques they use to carry out their work. Its main objective is to facilitate access to computational resources through a network of interconnected centres in several countries of the European Union. In this way, international scientific collaboration is facilitated and strengthened.
This federation is home to two types of groups within its bosom: Organisations representing national e-Structures (NGIs) and European Intergovernmental Research Organisations (EIROS).
EGI offers a wide range of services to its clients. partners, ranging from support consulting to marketing, but your The main function is the creation of single access points for all their researchers. In this way, what is achieved is to homogenize the sources. and prevent them from occurring. duplications.
The functioning of this international platform is identical in the organizations corresponding to each country. In the case of Spain, ES-NGI is a collaborative environment for Spanish researchers to work together.
ESFRI – European Strategy Forum Research Infrastructures
ESFRI stands for the European Strategy Forum on Research Infrastructures. It is a strategic instrument for developing Europe’s scientific integration and strengthening its international reach.
The purpose of this institution, in addition to to support the scientific community, it is that the planning is framed within the strategic objectives set by the European Union. This satisfies the needs of citizens.
Each year, ESFRI publishes a roadmap summarizing the results achieved and giving an overview of the status of projects. In its last published roadmap (2018), there are a total of 18 projects underway, divided into five different categories: energy, environment, health and food, physical sciences and engineering, and cultural and social innovation.
In the last year, ESFRI modified and refined the definitions, models and methods, so the current methodology looks like this: development of concept, design, preparation, implementation, operation and conclusion.
Lifewatch, the union of science and open data
Within this list of projects included in the annual report, it is worth highlighting LifeWatch ERIC, key name for electronic infrastructure for biodiversity and ecosystem research.
Lifewatch is a consortium of European infrastructures led by Spain (its central base is in Seville) and in which the following participate: Belgium, Slovenia, Greece, Italy, Holland and Portugal and Slovakia as observer country.
The aim of this project is to put an end to the limitations affecting scientific research and to cover the need for more and more varied data. To achieve these purposes, tools such as Big Data analysis, semantic resources and also open and FAIR data are used.
FAIR data is an acronym in English formed by “findable”, “accessible”, “interoperable” and “reusable”. This acronym forms the word “fair”.
Although they are very similar concepts, they are not exactly the same, since fair data (“fair data”) need not necessarily be open. Their appearance is that they must be accessible and this may mean that they are accessible to a particular group or by any person (in this case, they would be open).
For example, a typical process of the data is that they begin by being accessible only by a group of people. people who are working with them. Then it goes through the hands of more people that help to refine the whole and, finally, in the event that it has been so are made accessible to everyone and become open data.
Operating in several countries of the Union with Spain as the coordinating centre, actions can be carried out in the following areas local areas that are not confined to a single country, offering a more comprehensive vision. of the continent.
The importance of data open in the investigation
As we have been pointing along the article, open data are a central axis for the development of this type of projects such as Lifewatch, as they allow sharing information with others. and to create a true scientific community that feeds back to its members. These are the advantages of open data:
- Increased opportunities for synergies, in such a way that efforts are joined to achieve the objectives in lesser degree time.
- Prevents duplication projects or lines of research, as it is known what they are working on the comrades instantly, it doesn’t matter if they are in another country.
- Reduced use of information which may be wrong or obsolete.
- It encourages and enhances the collaboration between researchers, regardless of the research center to find each other.
- In short, we optimize the resources to get results more efficiently than before.
The way to achieve this use of the data opened in the investigation has needed some previous steps, which we will now detail:
- Development of standards international organizations, since, without them, the collaboration between the community and the would be impossible by processing the data in a heterogeneous and unified manner.
- Make public investments for provide universities and research groups with the infrastructures and tools needed to work together and to harness the potential of the open data properly.
- Fostering solidarity between the different groups of researchers, overcoming that fear of sharing results of his own works.
Open Preservation Platform Date
Taking into account all these aspects described above, it was decided to create the Open Data Preservation Platform ICTS-RBD ICTS so that the researcher would have the ability to manage the complete life cycle of the data, which we are now going to delve into.
The life cycle of the data refers to all the phases through which they pass, from planning to consumption by third parties. That is why it is necessary to know each one of the stages in order to offer a specific support.
- Planning of the management of the data.
- Acquisition of data, either to through sensors or external repositories.
- Data storage.
- Retrieval of stored data in heterogeneous sources.
- Publication of the data in open data portals following the standards established
- Data consumption.
- Data preservation.
Originally, the Preservation Platform of Open Data counted with six modules, with which two more needed to be implemented in order for it to remain complete: authentication and authorization. In this way, the final structure of the the platform was as follows: planning, acquisition, open data portal, consumption, preservation, storage, authentication and authorization.
These are the solutions that were used for each of the phases of the cycle:
- Planning: Solution based on DMPTool and extended to allow the use of ontologies (and add semantics to the DMP), integration of associated metadata, and so on.
- Acquisition: Python solutions for monitoring and remote control of sensors, calibration modules, connection to defined data (existing in external repositories or available in remote repositories), and so on.
- Storage and recovery: Solution that allows obtaining information from heterogeneous sources in a centralized and common way, starting from OneData.
- Publication: Portal based on Invenio, allowing exploitation as open data, and assigning a DOI (Digital Object Identifier) for each dataset.
- Consumption: Exploitation of development environments for researchers based on Jupyter Notebook.
- Preservation: Open Source tools such as Bacula on physical media (disks, SAN/NAS, tapes, etc.).
What was initially going to be a portal of Open data ended up being an Open Science Platform, which functioned as a coordinating element and point of reference. input to the other remaining modules.
This project was completed in November 2015 and was deployed at the Scientific and Technological Facility. Unique to the Doñana Biological Reserve (ICTS-RBD), available to the network of ESFRI-Lifewatch research.
Within this project led by Telefónica, Viafirma developed five modules of the platform, collaborating with Adevice, who was in charge of the data acquisition module, and with Aeonium, a nascent technology-based company, in charge of developing the Open Science Platform.