Privacy technology, data regulations, and ethical AI: our interview with Behrang Raji

Behrang Raji
By
Elise Devaux
May 11, 2021
-
5
minutes read

In this new entry of our data interview series, we talked to Behrang Raji, data protection and technology specialist, about privacy, data regulations, and ethical AI. Note that the opinions expressed in this article are those of the author and do not necessarily reflect the views of HmbBfDI.

Can you tell us about you, your role within the Hamburg State Commissioner for Data Protection and Freedom of Information, the role of this entity, and your work on data protection and technology?


I am an officer at the Hamburg Data Protection and Freedom of Information Commissioner (HmbBfDI). In the supervisory authority for data protection, I monitor companies in my department for compliance with the GDPR and enforce it with the means at my disposal.

Independent supervisory authorities with effective law enforcement resources also play a prominent role in the area of data protection law in terms of society. Especially in times of the pandemic, it has become clear how important digitization is for social participation. At this interface, we strive on a daily basis to ensure that this digital transformation process is contained in a way that respects fundamental rights and data privacy



We often get the question of what "anonymization" means. The GDPR makes a distinction between anonymized and pseudonymized data. But from a technical point of view, true anonymization is extremely hard to achieve. How can enterprises know if they are complying with the legal definitions?


Anonymization itself is not defined in the GDPR, nor is the term anonymous data. The GDPR follows a binary system: either we are talking about personal data or anonymous data. The GDPR does not apply to anonymous data. Nevertheless, it does apply to the way out of the scope.

Regardless of the question of whether it is even possible to achieve true anonymity in times of AI and Big Data, it would be desirable if there were legally clearer specifications. For example, we also speak legally of anonymous data if de-identification would require a disproportionate effort in terms of time, costs, and manpower, and the risk of de-identification is thus negligible.

Given the technical possibilities, the question arises as to what can still be considered disproportionate and whose effort must be taken into account; that controller or the effort of any body? In this respect, companies are indeed faced with the great challenge of having to decide for themselves whether the threshold of anonymity has actually been reached in the context of anonymizing data.   




More and more public data authorities are exploring the uses of Privacy Enhancing Technologies (PETs). Can you tell us about how Germany has been evaluating synthetic data and other PETs? Which legal developments should be made around synthetic data, in your opinion?


In its report, the Data Ethics Commission set up by the German government highlighted some significant advantages and thus recognized the possibilities of such synthetic data for data subjects and Controllers. Nevertheless, the possibilities under Art. 89 GDPR to create privileges for research in this area have not been satisfactorily resolved by the German legislator.

Overall, practicable standards for anonymization must be established for data subjects, but also for controllers. In addition, certain freedoms for research in areas of PETs must be possible. This applies, for example, to systems for privacy-friendly training of AI systems, by means of so-called GANs, possibly taking into account differential privacy approaches, etc. This requires a regulatory framework that takes sufficient account of research and the fundamental rights of the data subjects. 

Overall, practicable standards for anonymization must be established for data subjects, but also for controllers. In addition, certain freedoms for research in areas of PETs must be possible.



Where do you see future legal regulations impacting machine learning development? And which pitfalls must companies avoid when developing new applications today if they wish to remain compliant tomorrow?


Recently, the European Commission presented its draft for a regulation of technology with artificial intelligence, the so-called "Artificial Intelligence Act", or AIA. The draft is a binding regulation for all member states. It will be several years before a modified version of this draft actually comes into force.

However, from my point of view, some key points have been set. AI regulations will be general and sector-specific, and risk-based. Companies will face heavy fines of EUR 30 million or 6% of total annual global revenue for the previous fiscal year if they fail to comply with the obligations of the AIA. Regulatory developments in this area already need to be taken very seriously, as other sets of regulations in addition to the AIA will include provisions that normatively constrain smart technology products and services.

Furthermore, regardless of transparency requirements under data protection law, companies should already take the importance and implementation of transparency requirements into account when developing their systems.     

Furthermore, regardless of transparency requirements under data protection law, companies should already take the importance and implementation of transparency requirements into account when developing their systems.     

We are starting to hear about “ethical AI”, with individual privacy at the center of AI developments. What do you think of it? What component do AI developments need to include to be considered ethical?


The ever more intensive life in digitality must be accompanied by digital ethics that enable us, humans, to make reflective decisions about what we want in terms of what is technically possible. Now the relationship between law and ethics is very complex.

However, taking into account that the two fields are not independent of each other, AI would be ethical if it is used for legitimate purposes and tries to take into account constitutional fundamental values by design. This includes, for example, a certain level of transparency regarding how it works, criteria-driven training with a sufficient amount of meaningful data to avoid discrimination.  

Discriminatory algorithms are a problem that is getting more and more attention in public discussions. Is it possible to implement fairness in machine learning systems and what role do you think synthetic data plays in this context?

Following on from the previous question, I think implementing machine learning fairness by design is very difficult to implement.

First, measuring algorithmic fairness complicates the many different definitions of fairness. We, humans, function fundamentally differently than machines. For example, our General Equal Treatment Act requires that as an employer in an application process, I may not base my decision on proscribed characteristics such as ethnicity or gender. Through this obligation, people condition themselves, as it were. Because this at least reduces racist decisions in the application process. If one wanted to implement this by machine, by not having the machine process certain data in the first place, this blindness does not make the system non-discriminatory. This is due to proxies that are processed by the AI system, i.e., information that is representative of a protected characteristic because it correlates very strongly with it.

Ultimately, transparency and concrete validation procedures will be approaches to identify and eliminate discrimination. Especially in the training process, a huge amount of - sometimes personal - data is often required. In the training process, synthetic data, i.e., anonymous data, could be very helpful in making AI systems more privacy- and consumer-friendly.  

Ultimately, transparency and concrete validation procedures will be approaches to identify and eliminate discrimination.

Thank you, Behrang! You can learn more about HmbBfDI on their website or follow the German Federal Foundation for Data Protection on Twitter.


Check out our other interviews

Get the latest content straight in your inbox!

Articles you might like

Synthetic data, real use-cases: a talk with our product team

Read more

AI and healthcare: 7 questions with Jeanette Knipp, Junior Associate at idalab

Read more

On business and privacy: A talk with Statice’s commercial team

Read more