EU Data spaces for vertical integration

Key takeaways:

  • EU data strategy aims to empower society to make better decisions.

  • Vertical integration via data spaces is planned.

  • Swarm can lead to lesser fragmentation, reduce imbalances in market power, provide needed scalable infrastructure and means for empowering individuals.

  • Swarm can enable connection across multiple data spaces.

  • Datasets can be built up from individual level to higher levels based on consent and retaining trust.

The European data strategy addresses the quickly developing data economy (European Commission, 2020b). It acknowledges that on one hand citizens will generate an ever increasing amount of personal data on and on the other hand the non-personal industrial and public data will be generated by other sources in large volume. The EU Commission is basing its vision on European values and fundamental rights and conviction that human beings should remain central.

The goal of the strategy is for the society to reap the benefits of the growths and innovation, while for the EU becoming a role model "for a society empowered by data to make better decisions".

To acquire a leading role in the data economy, they find that areas of connectivity, storage computing power, and cybersecurity must be tackled, as well as governance structures for handling data. At the same time, fundamental values that are the foundation of European societies must be respected. EU plans to find a balanced way of allowing for wide use of data while limiting the flow for still "preserving high privacy, security, safety and ethical standards".

The availability of data is recognised as crucial for training artificial intelligence systems, and many other services which are the business of startups and SMEs. Pools of quality data must be made available for use and reuse. These should be able to support Big Data analytics and machine learning. The EU data strategy presumes that computing will move from centralised data centres closer towards the user - so called edge computing.

A common data space is meant to be created, facilitating flow of data from worldwide, yet satisfying the European standards regarding data protection. As each domain has some specifics, sectoral data spaces will be needed in strategic areas, e.g. health, mobility, manufacturing etc.

Several problems exist for the EU to enable this strategy which Swarm addresses to a different degree.

Fragmentation between states and availability of data. As Swarm can be regarded as one common cloud without borders and siloed data does not exist in it, if the individuals hold that data under their control, the availability of data should increase substantially. There is no one to stop users from sharing data if they wish so.

Imbalances in market power. These stem from concentrated cloud services and imbalances in relation to access to and use of data. It is probably more specific in the B2B or B2C sector, as public services are bound to share the data to some degree. The solution is to make the person the center of integration, giving them access and control of the data and the power to share it with others. Swarm offers a very compelling option for this kind of personal data storage, as it needs minimal setup on the side of the users and is pay-as-you need and "infinitely" scalable. Although common structuring of data should be resolved for complete interoperability to be achieved. But access to the data can be done in the same, standard way for all involved.

Data governance. As part of blockchain network(s), the usage of smart contracts and advanced governance mechanisms, for example via Decentralized Autonomous Organisations (DAOs), is relatively closer to reach than for traditional cloud solutions. And although some research possibly still has to be done in this area, a lot of it is ongoing and the area is vibrantly under development.

Data infrastructure and technologies. Currently, the EU based cloud providers capture only a small share of the market and the EU depends on external providers to a large degree. Swarm removes this dependence, as the network assures that the availability of infrastructure, if the nodes are properly spread between organisations, cannot be limited by any one provider. Since cloud uptake is low in the european public sector, having the option of a self-sovereign and low cost cloud could lead to more adoption, possibly leading to more efficient digital services from the aspect of costs and scalability of cloud computing to deploy AI technologies. Also, data portability between clouds would be less of a question. As Swarm is designed to run on commodity hardware, scalability is relatively easy to achieve.

Empowering individuals to exercise their rights. The EU commission mentions several problems in this space: the actual exercising of rights is not straightforward and is burdensome, with tools needed for consent management, personal information management apps, personal data cooperatives or trusts, etc. Several initiatives tackle the problem of empowering individuals with tools, MyData mentioned among them. Although Swarm does not tackle individuals directly, it does enable apps to be run using it, disabling the possibility of any further data lock-in. Fairdrop is one such app, that allows for personal data storage on Swarm and has support for consent receipt storage.

Cybersecurity. The issue of security for data is still present, but with the decentralisation of storage, a hacker has no central point to attack to gain data for a number of subjects. Each one would have to be targeted individually, raising the costs and lowering feasibility of such attacks.

There are many advantages to using Swarm as the underlying storage for EU Data spaces. It is always on, always available, fault tolerant and infinitely scalable. The EU lists several data spaces for different verticals, but also a personal data space. Swarm will be the place to connect personal data with data from other data spaces including IoT and company data. Curated data made available on data spaces will become one of the building blocks of the thriving data economy.

Everyone can have the same way to access different kinds of data, facilitating interoperability. Datasets can be built up from the level of individuals, to form larger “virtual” datasets for a specific purpose. E.g. this is a good case for building up municipalities, cities, national data and finally international data sets - crossing borders seamlessly, when appropriate. Storage providers are not controlling the data, hence you retain sovereignty over it. Yet, it is relatively easy to join the network, contributing resources in the form of nodes. Individuals could control their data directly, or they could entrust it into “trust” acting on their behalf.

The process could be started with public data, where the costs of storage and providing availability should be lower than holding them in private clouds. Concretely, the Open Data Directive makes more data from the public sector available. This can be put into Swarm rather seamlessly, if available in a properly structured way, and can be shared with service providers. As the peer-to-peer cloud is proven, the move to store sensitive and personal data will be made easier.

It is the FairOS layer that will be leveraged for this kind of storage to be widely available and easy to use, while keeping usage relatively easy. Besides storage, a compute layer will also be added, offering options to manage resources with Swarm cloud underlying it. The paper provides a description of the FairOS-dfs layer.