Statice is Now part of Anonos Data Embassy Platform

Statice's synthetic data technology is now part of Anonos Data Embassy, the award-winning data security and privacy solution.

LEARN MORE

Provinzial successfully conducts predictive analytics on synthetic insurance data

Over 80%
synthetic data usability achieved while maintaining anonymity
97%
predictive analytics model performance effectiveness
3
months saved in evaluating data privacy risks

Summary

The best use of your data can only be achieved with excellent data management. Customer data in the insurance industry is sensitive and cannot be freely shared between departments or external partners, slowing down data analysis efforts.    

Through testing Statice's synthetic data solution, the data science team at Provinzial, the second largest public insurance group in Germany, aimed to revamp the way they put their customer data to work. 

Due to the challenges of sensitive data usage and the need to work with data faster in a competitive market, Provinzial sought out advanced data anonymization solutions. Provinzial used synthetic data for 'next best offer', a form of predictive analytics, to identify the needs of over a million customers. 

The Provinzial data science team:

  • Streamlined the data usage approval process with the data privacy team.
  • Achieved over 80% usability of synthetic data while maintaining data anonymity.
  • Trained a machine learning model on synthetic data and achieved 97% in performance effectiveness. 
  • Reduced the time-to-data by 4 weeks without having to adjust the internal data sharing workflow.
  • Saved up to 3 months in evaluating data privacy risks.
synthetic data insurance case study

Challenge

  • Data scientists are often unable to foresee all possible data applications at the outset of a project, and both internal and external privacy restrictions strictly limit data use.
  • A back and forth process of fleshing out all possible uses of the data, and evaluating the potential risk of leakage, may require several weeks or even months. 
  • Provinzial’s data scientists used synthetic data to gain quick access to a large data pool to experiment with, and develop a use case for predictive analytics.

Solution

Utility

Anonymization methods like masking or k-anonymity can increase privacy, but at the expense of utility. Because Provinzial's customer data was highly detailed and extremely sensitive, they needed an anonymization solution which would not adversely impact the usefulness of this data.

Synthetic data turned out to be a great fit as it maintained the statistical value of original data, thus increasing the utility. The Utility Evaluator wraps multiple evaluators and provides a high-level view on the utility of our synthetic dataset without disclosing any of the statistical properties. 

Privacy

Provinzial's data team was seeking a high privacy-preserving solution to meet the GDPR requirements and company's internal privacy regulations to obtain approval for the use of sensitive customer data. 

Synthetic data ensured high level of privacy. The process of generating synthetic data completely breaks 1-1 relationships between original and synthetic records, minimizing the chance of re-identification.

Statice solution added additional layers of privacy to the synthesization mechanisms, such as differential privacy. 

Internal workflow

For the Provinzial data science team, it was essential to be able to reduce time-to-data without having to change the internal system. The solution had to also go along with the existing workflow of the data without disruption.

The team established a data architecture using anonymized synthetic data and could perform specific tests without needing original data, resulting in accelerating time-to-data by 4 weeks. 

Result

Provinzial used their existing “next best offer” model (a form of personalized marketing based on predictive analytics; the next best offer model predicts consumers' needs and shows them offers and products based on their habits), to train it on synthetic data and compare the result to the model trained on real data. 

Provinzial team performed a three-fold evaluation, focusing on data usability, model usage and privacy regulations. 

  • The synthetic dataset has shown a high level of privacy without any indication of re-identification concerns. Although there were many variables in the dataset, the large volume acted as an additional shield, minimizing the risk of re-identification. Statice software saved about 3 months of time by automating these privacy evaluations. 
  • By comparing the two datasets (original and synthetic), the Provinzial data team found that over 80% of the synthetic data was usable. Utilizing Statice utility evaluations, the team was able to quickly assess the usefulness of the synthetic data and adjust it as needed, saving about a month of manual work.
  • Their second evaluation phase focused on the model usage - training synthetic data versus real data where synthetic data reached 97% in performance compared to training on original data. 
  • Synthetic data has proven useful not just for the use case that they tested it for, but also for other applications, slightly different predictive analytics models, and use cases with minimal adaptations.

"Statice's solution helped us conduct predictive analytics and test our hypotheses while keeping customer data secure. We have found it to be a useful solution for our data science team to simplify data access and focus on our data projects, machine learning model optimizations, and testing new ideas." Dr. Sören Erdweg, Artificial Intelligence & Data Development at Provinzial

Synthetic data benefits

  • Synthetic data would have many benefits for team members and departments that work with data on a daily basis. For instance, when sharing user data with external companies to perform analysis. These types of projects would not be possible otherwise. Sharing customer data with third parties would be much easier with synthetic data.
  • Furthermore, it gives the company the flexibility to use synthetic datasets for different projects within the company, which is heavily regulated now with the need to evaluate who has access to the dataset and how it is used. With synthetic data, internal data projects can be enhanced in a way that was not possible before.
  • Data scientists and ML specialists would find synthetic data useful in daily operations, simplifying data access and operations, allowing them to focus on model optimization, be creative, and test new ideas and hypotheses.

Predictive analytics help insurers gain actionable insights into every aspect of their business, improve customer experience, increase sales, and look into the future. In order to deploy predictive analytics in a compliant and privacy-preserving manner, organizations will need to utilize data anonymization methods. For Provinzial, synthetic data proved to be the ideal solution. After all, data is only a strategic asset when you can put it to work.

Case study

Company:

Provinzial

Industry:

Insurance

Product:

Statice SDK

Goal:Using synthetic data for a predictive analytics 'next best offer' recommendation engine to identify the needs of over a million customers while preserving the highest standard of privacy.

Challenges
  • To boost growth and increase customer satisfaction, Provinzial wanted to make the most of its data operations.
  • To train ML models, gather insights, and develop predictive analytics use cases, Provinzial's data team needed access to a large data pool. However, access to sensitive customer data can take weeks or even months. 
  • Provinzial's data team used data regulation challenges to build a synthetic dataset for testing, evaluating, and quantifying ideas. 
  • Statice's solution helped manage internal data privacy measures - privacy-preserving synthetic data is not subject to GDPR.

Want to learn more?

When using synthetic data generated by Statice, companies do not have to worry about re-identification of a real person.
Contact us