How healthcare enterprises can benefit from synthetic health data, including 3 practical use-cases

synthetic health data
By
Joanna Kamińska
December 14, 2021
-
9
minutes read

If you work in the healthcare industry, especially with a focus on health data, you might find this article useful. You'll learn the current challenges this sector faces and how synthetic health data can add a layer of innovation to the whole industry.

This article answers the following questions:

  • What are the challenges of healthcare companies that lag in digital maturity?
  • How can Machine Learning and Artificial Intelligence transform the healthcare sector?
  • How can a healthcare provider move past obstacles that hinder data analysis in this sector?
  • What is synthetic health data?
  • What are the practical applications of synthetic data in healthcare?

⬝⬝⬝


Working in one of the most regulated sectors isn't easy – many healthcare providers struggle to make their organizations more digital. Some companies have just introduced electronic health record (EHR) systems, while others still collect a significant portion of their patient data on paper. 


Market leaders, such as UNC Health and Cleveland Clinics in the US and AstraZeneca or Roche in Europe, turn to big data analysis, and the use of artificial intelligence is growing. Meanwhile, other healthcare providers that face digital challenges but don't immerse in digital transformation lag behind.


What are the challenges connected with low digital maturity?

  • Higher service costs due to the use of legacy systems, information silos, and lack of scalability,
  • Lower patient satisfaction due to lack of personalized treatments,
  • Reputational risks for healthcare providers due to a security breach, malicious attack, or data leakage

But this is just the tip of the iceberg. Beneath, you might find more issues, for example, difficulty to move towards a digital healthcare system or a diminished ability to increase interoperability. 


Also, the healthcare sector was hit hard by the COVID-19 pandemic. Because of the global health crisis, the demand for digital transformation and a more data-driven approach is getting even more vital than before.

quote synthetic health data


Big data plays a key role in understanding the characteristics of abnormal situations and obtaining knowledge that lets healthcare providers make the right decisions.

For example, with Machine Learning (ML) and Artificial Intelligence (AI), healthcare entities can:

  • Develop more personalized treatments for patients,
  • Conduct new drug research and speed up medical trials,
  • Fight existing diseases faster and easier identify the new ones,
  • Model the spread of the pandemic,
  • Predict the state of health on the societal level.

It all looks very promising. But despite patient data being a valuable source of information and helping drive innovation, there are limitations to what extent organizations can benefit from them.


Privacy and security regulations, too many data sources and formats, and high data costs hinder operations on big data in healthcare. 


If you want to learn more about obstacles in healthcare, below you'll find more details. 

⬝⬝⬝

Challenges in health data collection and analysis

As you already know, big data has a huge impact on healthcare sector innovation. But unlike retail or manufacturing, healthcare projects that depend on data analysis are harder to conduct. 

Why? Because of the following challenges:


1. Privacy and security regulations

There is no more private data than health records. That's why healthcare regulations around the world are strict and impose clear rules on health data collection, storage, and transfers


Here are the most important laws that cover healthcare data collection and usage: 


  • In Europe: the General Data Protection Regulation (GDPR) sets gold standards for handling personal data, including patient data. Every healthcare organization that deals with European health records should ask for the minimum amount of health information and solely use it for medical treatment purposes. 

Data collection for research and exploration purposes demands patient consent for secondary use. As patients also have the right to be forgotten, healthcare entities have a limited option to gather and proceed with data analysis.


  • In Germany: the Digital Care Act (DVG) requires Digital Health Applications to stay compliant with data protection and data security requirements. 

Digital Health Applications can process personal information only if they obtain patient consent. In most cases, they don't get patient consent for more than the main health processing reason. As a result, companies can't process any dataset that contains sensitive information for research and exploration purposes. 


  • In the United States: The Health Insurance Portability and Accountability Act (HIPAAPrivacy Rule sets national standards for protecting patient health data. Additionally, HIPAA Security Rule states a checklist for safeguarding how healthcare providers handle protected health information (PHI), including transmission security, access control, data integrity, and data audits. 

Data regulations are complex and it's hard to fully understand them and avoid hefty fines in case of malpractice. It's especially true in times of pandemics when healthcare legislation is evolving fast. 


While resigning from healthcare data analysis seems to be a solution, it doesn't help evolve and introduce new, cost-effective procedures and new ways of treatment. As you can see, not using health data is an option, but it negatively impacts both patients and healthcare organizations. 


2. Health data comes from multiple sources and in various formats.


Data governance is still a challenge in healthcare. One of the reasons is that health data comes from multiple sources such as:

  • Hospital records,
  • Medical patient records,
  • Various examinations,
  • Wearable devices.

Because the variety of data sources is wide, the differences in formats and accuracy are burdensome. For example, data from an EHR system has a specific structure, while a multimedia file is an unstructured data type. Also, non-digital data can significantly differ too. 


The process of completing, formatting, and finally cleaning those health records can take a lot of time and effort. And in many cases, even having patient consent, data scientists can't be sure if such data will be of enough quality to be useful for analysis.


3. High data costs

As HIPAA Journal writes, on the IBM Security report on the cost of data breaches,

"Healthcare data breaches are the costliest, with the average cost increasing by $2 million to $9.42 million per incident. Ransomware attacks cost an average of $4.62 million per incident." 

Additionally, the year-over-year increase in data breaches grew during the pandemic. It's because many employees turned to remote work without implementing security measures. Another problem with remote work is that organizations are slower to respond to security incidents. 


Source: HIPAA Journal


Because healthcare companies have to safeguard their patient data, they put additional safety measures to increase the cost of data maintenance. For example, healthcare organizations invest in on-premises hosting to keep the data secure. This is tightly connected to higher costs and more IT specialists that have to take care of on-premises servers, their security, and maintenance.


As healthcare organizations face those obstacles on the way to fruitful data analysis, more and more companies are on the lookout for an alternative solution. 


And here comes synthetic data that has a great potential to impact the healthcare sector.

What is synthetic health data?

As the name hints, synthetic data doesn't come from real-world collections. It's the outcome of artificial data creation.


This type of data learns and replicates the statistical components of actual patient data and relationships between attributes of the real dataset. 


A great advantage of synthetic data it doesn't replicate:

  • Personal Identifiable Information (PII): information that lets us identify an individual to whom this information applies. For example, a full name, date of birth, or social security number.
  • Protected health information (PHI): any information or medical record that links to a specific individual under HIPAA in the United States. 

As synthetic data isn't real patients, its data points have low chances of leading to re-identification of a real patient or their personal data record. It is a significant advantage compared to data pseudonymization methods that carry more privacy-related risks.


Curious how you can generate synthetic data? Read this blog post.


What does synthetic data mean for your organization?

With the proper privacy protection mechanisms, synthetic data is anonymous. As a result, it's not as strictly regulated as personal data. You don't need secondary consent for further analysis of synthetic data. It means you can use it in different analyses such as medical research, clinical trial exploration, or any other medical investigation.


The quality of synthetic data depends on the quality of input data and the level of privacy protection. With a high-quality original dataset, the synthetic data output should be similar.

⬝⬝⬝

What are the benefits of synthetic data for the healthcare industry?

Each industry can benefit from synthetic data differently. Let's sum up what are the benefits of synthetic patient data in healthcare:


  1. Synthetic health data can closely reproduce the original distribution while leaving PIIs out. Thus, it's a rather safe-to-use asset for a data-driven entity that better protects privacy and confidentiality.
  2. This type of data doesn't require patient consent, so it's cost-efficient and easier to obtain,
  3. Synthetic patient data has fewer restrictions than personal patient data when it comes to secondary data processing use.
  4. Synthetic data generated from high-quality real patient data can improve the adaptability of AI modeling and pattern identification,
  5. Healthcare organizations can train Machine Learning models with synthetic data to meet specific conditions that didn't occur in real datasets. 

Now, let's dive into hands-on examples of how companies use synthetic health data in their 

⬝⬝⬝

3 practical examples of how to use privacy-preserving synthetic data in healthcare 

You can use synthetic data in healthcare in many different ways. Discover those 3 practical examples that might give you food for thought concerning your case or daily challenges. 


Use of synthetic data in clinical trials and scientific research


The reality is, in most cases, patients don't want to share their most sensitive information for analysis and exploration purposes. Also, asking for secondary consent is time-consuming and demands additional explanation. 


As synthetic data doesn't contain PII and PHI and doesn't demand additional patient consent, it opens doors for new possibilities. As this type of data is flexible to use, it can drive innovation and let companies understand patients and their diseases in completely innovative ways.


Analyzing synthetic data can contribute to faster disease discovery and a more personalized approach to patient treatment. 


In the case of clinical trials, data science teams can use synthetic data as a foundation for studies where they can't operate on real data or such data is too scarce. Sometimes, there is not much data because the illness is rare or new. 

Use of synthetic clinical data to improve Machine Learning models 


You can train Machine Learning models to improve their trustworthiness and reliability. In many cases, those models need high-quality data that comes in large samples. Synthetic clinical health data might be of great importance when training such models. 


As a result, the algorithms can produce new outcomes and help:

  • Discover a new disease,
  • Explore in-depth a rare disease,
  • Improve the process of drug or vaccine discovery.

Read about how well-known healthcare brands use synthetic data in their daily operations and learn how:

  • Roche is sharing synthetic clinical data for Machine Learning applications
  • M-Sense is anonymizing health user data for research on migraine

Download the whitepaper 


Use of synthetically generated data where there is no or little data


The struggle with the low accessibility of patient data is real for healthcare. In many cases, patient data samples are small or hard to use. 


For example, if a few people are willing to participate in a clinical trial, it's hard to stay data-driven and innovative. As healthcare organizations face underrepresented patient groups, synthetic data can complete existing datasets and increase data accessibility.


Healthcare providers can conduct big data analyses that might lead to discoveries with synthetic health data. 

Synthetic data is much more than just fake patient data.

With innovation coming from the use of synthetic health data, modern healthcare organizations can finally revolutionize medical therapies and more cost-efficient, personalized medicine. 

If you want to explore the topic more, our team will be happy to help you start exploring synthetic privacy-preserving data today.

Ask us your questions

Get the latest content straight in your inbox!

Articles you might like

AI-driven data agility: a case for synthetic data in insurance

Read more

Which industries have the strongest need for synthetic data?

Read more

Newsenselab able to make medical data available for research while guaranteeing patients’ anonymity

Read more