Statice's synthetic data technology is now part of Anonos Data Embassy, the award-winning data security and privacy solution.
In this new entry of our data interview series, we talked to Behrang Raji, data protection and technology specialist, about privacy, data regulations, and ethical AI. Note that the opinions expressed in this article are those of the author and do not necessarily reflect the views of HmbBfDI.
I am an officer at the Hamburg Data Protection and Freedom of Information Commissioner (HmbBfDI). In the supervisory authority for data protection, I monitor companies in my department for compliance with the GDPR and enforce it with the means at my disposal.
Independent supervisory authorities with effective law enforcement resources also play a prominent role in the area of data protection law in terms of society. Especially in times of the pandemic, it has become clear how important digitization is for social participation. At this interface, we strive on a daily basis to ensure that this digital transformation process is contained in a way that respects fundamental rights and data privacy.
Anonymization itself is not defined in the GDPR, nor is the term anonymous data. The GDPR follows a binary system: either we are talking about personal data or anonymous data. The GDPR does not apply to anonymous data. Nevertheless, it does apply to the way out of the scope.
Regardless of the question of whether it is even possible to achieve true anonymity in times of AI and Big Data, it would be desirable if there were legally clearer specifications. For example, we also speak legally of anonymous data if de-identification would require a disproportionate effort in terms of time, costs, and manpower, and the risk of de-identification is thus negligible.
Given the technical possibilities, the question arises as to what can still be considered disproportionate and whose effort must be taken into account; that controller or the effort of any body? In this respect, companies are indeed faced with the great challenge of having to decide for themselves whether the threshold of anonymity has actually been reached in the context of anonymizing data.
In its report, the Data Ethics Commission set up by the German government highlighted some significant advantages and thus recognized the possibilities of such synthetic data for data subjects and Controllers. Nevertheless, the possibilities under Art. 89 GDPR to create privileges for research in this area have not been satisfactorily resolved by the German legislator.
Overall, practicable standards for anonymization must be established for data subjects, but also for controllers. In addition, certain freedoms for research in areas of PETs must be possible. This applies, for example, to systems for privacy-friendly training of AI systems, by means of so-called GANs, possibly taking into account differential privacy approaches, etc. This requires a regulatory framework that takes sufficient account of research and the fundamental rights of the data subjects.
Overall, practicable standards for anonymization must be established for data subjects, but also for controllers. In addition, certain freedoms for research in areas of PETs must be possible.
Recently, the European Commission presented its draft for a regulation of technology with artificial intelligence, the so-called "Artificial Intelligence Act", or AIA. The draft is a binding regulation for all member states. It will be several years before a modified version of this draft actually comes into force.
However, from my point of view, some key points have been set. AI regulations will be general and sector-specific, and risk-based. Companies will face heavy fines of EUR 30 million or 6% of total annual global revenue for the previous fiscal year if they fail to comply with the obligations of the AIA. Regulatory developments in this area already need to be taken very seriously, as other sets of regulations in addition to the AIA will include provisions that normatively constrain smart technology products and services.
Furthermore, regardless of transparency requirements under data protection law, companies should already take the importance and implementation of transparency requirements into account when developing their systems.
Furthermore, regardless of transparency requirements under data protection law, companies should already take the importance and implementation of transparency requirements into account when developing their systems.
The ever more intensive life in digitality must be accompanied by digital ethics that enable us, humans, to make reflective decisions about what we want in terms of what is technically possible. Now the relationship between law and ethics is very complex.
However, taking into account that the two fields are not independent of each other, AI would be ethical if it is used for legitimate purposes and tries to take into account constitutional fundamental values by design. This includes, for example, a certain level of transparency regarding how it works, criteria-driven training with a sufficient amount of meaningful data to avoid discrimination.
Following on from the previous question, I think implementing machine learning fairness by design is very difficult to implement.
First, measuring algorithmic fairness complicates the many different definitions of fairness. We, humans, function fundamentally differently than machines. For example, our General Equal Treatment Act requires that as an employer in an application process, I may not base my decision on proscribed characteristics such as ethnicity or gender. Through this obligation, people condition themselves, as it were. Because this at least reduces racist decisions in the application process. If one wanted to implement this by machine, by not having the machine process certain data in the first place, this blindness does not make the system non-discriminatory. This is due to proxies that are processed by the AI system, i.e., information that is representative of a protected characteristic because it correlates very strongly with it.
Ultimately, transparency and concrete validation procedures will be approaches to identify and eliminate discrimination. Especially in the training process, a huge amount of - sometimes personal - data is often required. In the training process, synthetic data, i.e., anonymous data, could be very helpful in making AI systems more privacy- and consumer-friendly.
Ultimately, transparency and concrete validation procedures will be approaches to identify and eliminate discrimination.
Thank you, Behrang! You can learn more about HmbBfDI on their website or follow the German Federal Foundation for Data Protection on Twitter.
Contact us and get feedback instantly.