Jeanette Knipp is a Junior Associate at idalab, a data science consulting firm with expertise in artificial intelligence, machine learning, and predictive analytics. Together with her, we discussed AI and data access in healthcare.
Sure! Having studied design with a keen interest in technology, I joined idalab in March 2019 as a Data Innovation Designer. This role is as multifaceted as it sounds, as it combines UI & UX design with consulting in the Data science sector. This essentially means that I work on client projects that use AI technology and include a user interface component, mainly in the life science and healthcare sector.
The challenge now lies in linking these abstract, hard to grasp algorithms with intuitive interfaces that bring clarity into the technology behind them. Solving such tasks requires a very good concept (of thoughts and interactions) in the first place, so strategic thinking is crucial. Therefore, I soon started participating in the Strategy Track of idalab’s Junior Associate Programme which enables me to improve my knowledge in AI consulting and my strategic skills through special trainings. This is how I finally got to my current role at idalab as a Junior Associate with focus on strategy and conception.
I am currently working on INALO, a project which is funded by the BMBF (Bundesministerium für Bildung und Forschung) and which focuses on the optimization of alarms in the intensive care unit of hospitals. Together with AICURA medical, Charité and the Hasso Plattner Institute, we aim to use AI to prevent the desensitization of staff caused by excessive exposure to alarms (so-called “alarm fatigue”). Since up to 99% of all alarms are false and therefore do not actually require action, we use algorithms to create patient-centered alarms and thereby reducing the overall rate of false alarms.
Before INALO, I worked on a project with a global medical device OEM and healthcare provider in order to create a tool for medical decision support. Our goal was to improve dialysis therapy within the ICU through a clear visualization of data (e.g. ongoing processes) and the use of machine learning algorithms to make suggestions for therapy.
As data provides the basis for AI, a lack of access to data is a big issue. Current data privacy policies make it especially hard to get access to medical data of patients. This often slows down work processes since usable data is usually hard to find and, in many cases, requires extensive preprocessing (like anonymization).
Another hurdle are the high admission restrictions for the use of software as a medical device: Although there is increasingly much research and project work done in the field of data science and AI, it is still very hard to bring results to the market. Obtaining these admissions takes a lot of time which makes it costly and therefore frequently leads to promising concepts being laid out, yet staying unused.
When working with data, it is important to provide a proper data landscape which not only includes the technical aspects but also enables governance. To achieve this, we developed a 5-step process for the effective collection of useful data:
The first step is to identify all relevant data sources, as well as the original data owners. Secondly, the data owners need to grant access for the given use-case. Within the third step, the technical source that collects the real patient data needs to be identified and accessed. Step four is to extract a sample of the collected data, which lastly has to pass a quality check within step five.
Addressing all these points systematically enables us to get the data we need. Yet, in some cases it can be hard to address some of these points (e.g. due to a lack of data-access). In that case, it might be beneficial to use synthetic data to speed up processes.
Synthetic data can be useful in a variety of cases. On INALO, for example, we use synthetic data to satisfy data privacy related concerns: Especially in the medical context it’s hard to get a hold of realistic patient data since all real data has to stay within the hospital and be anonymized in order to work with it.
The process of data anonymization is not as easy as it sounds since it needs to be ensured that there is no possibility to retrace any information from the anonymized data set back to the original human. Therefore, we use synthetic data that resembles real patient data in certain points that are relevant for the project (e.g. vital parameters and their change in value over time, use of antibiotics and overall treatment history). Since this synthetic data has been generated digitally, it actually belongs to no human which makes it easy to use from a data privacy perspective. The advantage we get from this is that it enables us to filter alarms based on patient-specific aspects without working with actual sensitive patient data.
First of all, the use case: On INALO, it was important for us to get fast access to data when there was no sufficient data set – so we synthetically generated one. Yet, this might not be the best solution for every case.
For example, if your goal is to retrieve certain information from a data set without revealing the query, homomorphic encryption could be suitable. In contrast, when linking data sets it might be better to use harmonized pseudonymization since you want to combine different inputs in one data set while keeping the inputs private. These are only a few examples – so as you can see, choosing the “right” PET definitely depends on the use case.
Another aspect to consider are the legal requirements. When dealing with data, legal regulations have to be met, so for every use case it needs to be guaranteed that a certain level of privacy is granted. Since PETs differ in the level of security they can offer, this needs to be considered when deciding on a technology.
Depending on the application, another important point might be data utility. Often, there is a trade-off between the achieved level of privacy protection and data utility after usage of a PET. Anonymized data, for example, usually offers a high level of privacy protection while decreasing data utility since the original data is being modified, replaced or deleted. Pseudonymized data offers higher data utility since there is no actual loss in the original data but reversible modification – which therefore leads to a lower level of data privacy.
And eventually, different methods come with different levels of complexity. Encryption methods are very secure but also computationally complex. When choosing the right PET it is therefore necessary to consider usability and effort to make use of a given technology.
Something that won’t change soon is that PETs always have to be chosen with regard to the use case they shall be applied in. There won’t be one PET that fits all purposes in the near future. But as long as PETs have to be selected for each individual use case, standardizing data formats would create a huge benefit as this would make it easier to work with given PETs.
In terms of applications, I am particularly excited by bridging the gap between the clinical and biopharma R&D. There is so much potential in using clinical and treatment data to develop new therapies (think of Real World Evidence type studies) that is currently hindered by the inability to share data. If we could find a somewhat general solution and bring it efficiently to the scale of our healthcare system then the sky is the limit.
Wants to hear more from data experts? Check out the other interviews of our series:
Javier Franco-Contreras on data protection and watermarking technology
Sunaina Swaminathan on data access and acquisition
Contact us and get feedback instantly.