Synthetic data: How to manage sensitive data in a GDPR-compliant manner

synthetic data gdpr
By
Omar Ali Fdal

Synthetic data: How to manage sensitive data in a GDPR-compliant manner

Since the GDPR entered into force four years ago, the way companies have to handle personal data has changed drastically. Synthetic data holds great promise for this paradigm shift.

There is still a long way to go when it comes to data protection, as evidenced by the ever-increasing GDPR fines. Even so, data handling and the strategy behind it remain integral components of successful companies. No matter if you're in finance or insurance, healthcare or telco, it's hard to imagine business operations without big data. So how can businesses strike a balance between stricter privacy regulations and data innovation? Synthetic data could be the answer.

 

A look back on GDPR's first four years

 

Let's take a look at the latest updates in GDPR enforcement.

 

From a single fine of 400,000 euros in July 2018 to 332 fines in July 2020 (more than 130 million euros) and 1,030 fines in March 2022 (more than 1.6 billion euros), it is clear that fines for GDPR non-compliance are increasing rapidly.

 

Data-driven business models will be even more regulated in the future by the EU and national bodies through digital strategies and the ePrivacy Regulation that is still yet to come.

According to GDPR Enforcement tracker, Luxemburg is the country with the highest fines, followed by France, Ireland, and Italy. General data processing principles non compliance resulted in the highest fines, followed by insufficient legal bases, inadequate fulfillment of information obligations, and insufficient technical and organizational measures to ensure data security. The top three sectors with the most fines are industry & commerce, media, telecom and broadcasting, and transportation & energy.

The European Court of Justice declared the Privacy Shield invalid in the Schrems II case, forcing even the U.S. data giant Google Analytics to rethink its data processing model, as evidenced by the ban by the French data protection authority CNIL and a similar decision by the Austrian court. There is currently uncertainty about how things will unfold under the new transatlantic data transfer agreement following recent agreements between the EU and the US in March 2022.

 

However, European companies have also been fined large sums, mostly in industry and commerce (233 total fines of more than €796 million), followed by media, telecommunications, and broadcasting (177 total fines of more than €613 million). Consequently, it should come as no surprise that the most heavily fined companies are large conglomerates such as Amazon Europe, WhatsApp Ireland, and Google LLC.

 

Data privacy legislation will continue to be more stringent. At the same time, every aspect of business is increasingly reliant on data-intensive technologies. Many companies today determine their success by the ability to develop advanced AI and deep learning models that are based on data.

 

And despite best efforts, data can be misused if it falls into the wrong hands. When massive amounts of personal data are stored, cyberattacks and data breaches can become fatal quickly.

How can synthetic data help?

 

The question is how can an organization harness the value of the data on the one hand without jeopardizing the relationship of trust with its customers and, on the other hand, without having to fear severe GDPR penalties? One possible solution is synthetic data.

 

Synthetic data is artificially created data that serves various purposes, including Machine Learning. It retains the statistical distribution of the original dataset and is of comparable quality. The result is a set of data with high utility, which can be used as a replacement for behavior, predictive, or transactional analysis. 

Synthetic data can be used as a replacement or as complementary to real data. In addition, it can be utilized to train ML applications and improve AI projects

By removing one-to-one relationships with the original data, synthetic data mitigates the risk of re-identification of a real person. Additionally, solutions like the Statice Software add additional privacy mechanisms to reduce the risk of privacy attacks on synthetic data. Using these mechanisms, enterprises can legally comply with the GDPR requirements for data anonymization. By using synthetic anonymized data, they ensure the privacy of their customers and avoid the risk of violating the requirements for personal data processing.

Synthetic data implementation success factors

 

Although the technology itself may seem complicated, it can be quickly integrated into existing data pipelines. An on-premise integration is the obvious choice as it can be integrated into the local system or into the corporate cloud. 

 

With this approach, the synthetic data generation models can be trained where the actual data resides, removing the need to move sensitive data. Data protection officers will appreciate this approach because it keeps data safe, while also making the solution itself simple and easy to understand.

However, the success of synthetic data integration requires teams to plan and take into account a variety of factors:

  • Involve all stakeholders from the beginning: Business, data and analytics, IT, legal, must have a voice in the project and understand each other’s concerns and requirements. 
  • Engage data protection stakeholders. For your data to be safe and compliant with GDPR requirements for data anonymization, you must test the robustness of the synthetic data and document that the re-identification risk threshold is minimal.
  • Determine all relevant use cases that relate to the big picture. However, start small and figure out how to measure the success of your synthetic data integration.
  • Synthetic data is an emerging topic and adoption is still in its infancy. Therefore, when you choose a synthetic data vendor, invest time in training and developing the capabilities, making sure your team understands what this technology is, its limits, and true potential.

Explore our GDPR compliance guide

Get the latest content straight in your inbox!

Articles you might like

Synthetic data tools: Open source or commercial? A guide to building vs. buying

Read more

The impact of data bias on your business & the benefits of fair AI

Read more

8 types of data bias that can wreck your machine learning models

Read more