Skip to Content

The EU rules for high-value datasets have changed – how are European countries keeping up?

Eline Lincklaen Arriëns
13 Apr 2023

The European Commission is striving to make the EU a data-driven global powerhouse. To achieve this ambition, it recognizes the huge importance of high-value datasets and is mandating their publication by all EU Member States under an open license.

In January this year, the European Commission (EC) published a list of high-value datasets that EU Member States must make available free-of-charge by June 2024. These datasets are a specific category of open data, which is data that can be accessed, used, and freely shared for reuse with, at most, requirements to attribute the original source. In the case of the high-value datasets, these have been identified by the EC as publicly owned datasets that can have major benefit for society, the environment, and the economy.

What type of data is considered high-value?

The EC has classified six categories of open data as high-value datasets:

  • Statistics
  • Earth observation and environment
  • Meteorological
  • Geospatial
  • Companies and company ownership
  • Mobility

These categories of publicly available data are considered to be particularly useful for the creation of value-added services and applications for our society and economy. As an example, the EC states that datasets such as meteorological observation data, radar data, air quality and soil contamination and noise level data can support research and digital innovation, as well as enable better-informed policymaking, especially when addressing climate change and its impacts.

What are HVDs being used for?

The high level Open Data Maturity (ODM) Report 2022 from data.europa.eu revealed some interesting use cases pertaining to open data on earth observation and environment, meteorological, geospatial, mobility and statistical data. These include:

Environmental and economic impact

High-value datasets can help monitor forest fires across Europe. Data about forest fires comes from multiple sources, such as satellite imagery from the Copernicus program and national open data portals, and includes Earth observation and environment data, as well as meteorological and geospatial data. As climate change continues to influence forest fires, it is increasingly important to monitor and assess the situation using all useful resources. For example, datasets monitoring and assessing forests fires are used by EFFIS – a service that allows users to view the current situation in a map, read a curated list of new stories about fires, view long-term fire weather forecasts, and access a detailed statistical portal on forest fires. By making these high-value datasets available, organizations have access to more data that can support existing services and contribute to creating new tools that support proactive measures to prevent forest fires and support relief efforts.

The drive towards a greener and more sustainable (European) economy encompasses many aspects, including transportation. Here, modern and efficient transportation methods can significantly reduce individuals’ carbon footprint and monitor and support Europe’s transition towards greener mobility in the EU. One method is to exploit mobility high-value datasets. An example of such mobility data is the progress of railway electrification in Member States. Eurostat highlights how these datasets can provide information on the transportation infrastructure and give insights into the extent to which passengers and freight lines have been converted to electric lines, which play a key role in pushing towards greener mobility.

Social and political impact

High-value datasets can help measure and address income inequality across Europe. EU institutions acknowledge that income inequality indicators, such as statistical data, are highly informative and valuable measurements that can minimize income inequality. An example of a statistical dataset is ‘yearly inequality rate’. This dataset can provide insights and information about income inequality and its impacts on individuals, communities, and society, such as the concentration of earnings in a given population over time and across several factors, including gender, age and region. For example, Eurostat not only produces and shares these datasets, it also assesses the distribution of income among individuals by ranking them from lowest to highest earners, and subsequently divides the population into various sized ‘segments’ to inform users on how income is spread. Policymakers can use this information in the decision-making process to implement and enforce measures to reduce inequality and support minority groups or those in the lower quadrille.

Geospatial data can support environmental and economic activities by contributing to smart cities because it provides information containing specifications on properties linked to an exact point on Earth, such as satellite imagery and census datasets tied to a specific geographic location. Among the six thematic categories of high-value datasets, this involves geospatial (e.g., administrative units, geographical names, addresses, building, cadastral parcels); mobility (e.g., transport networks including geographical positions and links with cross-border networks); and earth observation and environment data (e.g., space-based or remotely sensed datasets and ground-based or in-situ datasets). These datasets can support services such as EVapp, which locates victims of cardiac arrests and identifies nearby first aiders in Belgium, or Digital Forest Dryads that protect forests from illegal deforestation in Romania and other EU countries.

The EC deadline is set for June 2024

As stated, the Implementation Regulation governing the free availability of the high-value datasets was officially published in January this year and Member States have until June 2024 to make them re-usable for free, using application programming interfaces, and available in machine-readable format. The ODM Report 2022 from data.europa.eu highlighted the steps being taken ahead of the Implementation Regulation, and reported a good level of preparedness. When the results for the ODM Report 2022 survey were gathered, it was observed that 96% of the 27 EU Member States were already working on identifying high-value data domains to be prioritized for publication. Further, 85% of the EU27 were already preparing to monitor and measure the level of reuse of high-value datasets.

With the Implementing Regulation now published and compliance required from June 2024, future Open Data Maturity assessments will keep track of the progress in applying the regulation from an organizational, technical, and legal perspective. They will also aim to look at the level of compliance.

What best practices are there for publishing high-value datasets?

The ODM Report 2022 revealed some of the preparatory steps countries have been taking. These steps offer a valuable guide for all countries seeking best practices for publishing high-value datasets, and include:

  • Preparing in advance:  Several countries started their work on high-value datasets before the publication of the Implementing Regulation: 96% of EU27 stated in the ODM 2022 that they were already identifying high-value datasets and 93% of them confirmed that they were preparing public bodies holding high-value datasets to denote those datasets in their metadata. For example, in Poland, the Chancellery of the Prime Minister started a consultation with all Polish ministries, subordinate units, and Statistics Poland on the draft Implementing Regulation. Similarly, in Austria, a Task Force on Public Sector Information and Open Data has been set up within the Federal Ministry for Digital and Economic Affairs with regard to implementing the Open Data and Public Sector Information Directive 2019/1024 and determining high-value datasets. Allowing for timely internal preparation facilitates putting in place the expertise and resources needed to respond to the requirements of the EC´s Implementing Regulation.
  • Highlighting high-value datasets: Making high-value datasets more obvious on national open data portals is a key practice. This is planned for instance on the Bulgarian open data portal, where high-value datasets will be assigned to a dedicated category, which will also be selectable through filters in the general section of the available datasets. Similarly, in Finland, the national data portal team has designed a symbol to use as an icon to highlight the high-value datasets and help users to differentiate them from other open data. Highlighting high-value datasets on the portals helps to keep track of identified high-value datasets and facilitates further collection in the data providers and data (re)users community.
  • Monitoring and showcasing (re)use: The practice of having a standardized way of gathering and cataloging open data reuse cases is encouraged in the ODM Report 2022 and 85% of the EU27 stated that they were also preparing to monitor the reuse of high-value datasets. The Czech Republic, for example, links individual datasets – including those labelled as high-value datasets – directly to the list of reuse examples on their national open data portal (which will also be in open data format). Hence, when reading the metadata of datasets labelled as high-value, it will be possible to see examples of their practical reuse. This allows the better understanding and communication of the (potential) impact of such datasets to a wider audience and stimulates further reuse and impact creation through open data.
  • Ensuring interoperability and metadata quality: In the ODM 2022, 63% of Member States responded that they were preparing to ensure the interoperability of high-value datasets alongside available datasets from other countries. An example comes from Germany, where a property in version 2.0 of DCAT-AP.de has been implemented in order to be able to better reference high-value datasets. Another example is Sweden, which introduced an interoperability framework for frequently used high-value datasets. Data quality and interoperability are key aspects to unlock the full potential of data sharing, even more when it comes to datasets with a high impact on our society and economy.

Aligning with EC priorities and sharing best practices

The high-value datasets identified by the EC closely align with their overarching priorities for 2019-2024. For example, geospatial and Earth observation and environment data have clear links to supporting the EU Green Deal, whilst statistics and companies and company ownership data can contribute to realizing an economy that works for the people. The annual Open Data Maturity assessment is helping European countries to push forward with these priorities.

Its purpose is to raise awareness on the state of open data practices in Europe and help countries to do more and better. This is enabled first and foremost through the sharing of information across countries, as epitomized in the ODM Report 2022. This sharing both of best practices and of the challenges encountered will help drive more effective implementation of the Implementing Regulation on high-value datasets to create greater impact.

Authors

Eline Lincklaen Arriëns

Senior Consultant and Expert on European data ecosystems Capgemini Invent NL
“Digital technologies are crucial in addressing global challenges, including climate change and environmental degradation. Capgemini aims to support clients accelerate their digital transition in a manner that is sustainable to their organization, society, and the environment, and in line with EU priorities such as the EU Green Deal.”