Preserving privacy with synthetic financial data

October 22, 2020

Why organizations should use synthetic financial data to preserve privacy

Strict data regulations and cumbersome data governance processes are causing innovation inertia in banks and financial institutions. Where data should drive product development and fuel analysis, we see slow and tedious processes preventing teams from accessing, sharing, and leveraging data. This post explores these challenges, as well as how to regain the ability to work with data safely and efficiently with synthetic financial data.

Data is critical in the digital transformation


Processing data is central for financial institutions on the path to digital transformation. In today's landscape, data fuels operational efficiency, and helps enterprises to build personalized customer experiences and develop competitive products.

synthetic financial data

Tomorrow's leading financial institutions are the ones continuously thinking about optimizing their data assets. 

synthetic financial data

The recent paradigm and technological shifts emphasize the organizational ability to aggregate, analyze, and use data. For example, the ability to access data to power machine learning applications gives financial companies an advantage in their ability to fight fraud, predict and prevent churn, and provide personalized experiences, to name just a few key use cases. Reliable data governance processes also allow businesses to migrate to cost-efficient infrastructures, such as public clouds. And these trends are growing.


 "by 2022, public cloud services will be essential for 90% of data and analytics innovation" Gartner, Top Data & Analytics trends in 2020


Data is companies' most valuable asset. But there are, unfortunately, many blockers on the road to data access for financial organizations.

The roadblocks on the path of data-driven transformation


Fast-evolving regulations and the associated compliance risks are among the blockers that companies are facing. In Europe, personal data protection laws reinforced the legislative frameworks that already regulated data processing in the financial sector.  Because of the financial risks of non-compliance, many enterprises adopt a cautious approach to data strategy.  


Overall, the finance sector has received more EU General Data Protection Regulation fines than any other industry. Recently, the Dutch Credit Registration Bureau (BKR) received an 830000€ penalty from The Dutch Data Protection Authority (DPA) for non-compliance.


The costs of non-compliance include not only fines settlements but also business disruption, productivity, and revenue loss. The Italian Garante (Data Protection Authority) fined UniCredit bank 600,000€ for non-compliance before the GDPR. This security-first approach left its mark on the sector. 

financial synthetic data

 To this, we must add legacy systems with proprietary formats or siloed IT infrastructures. They prevent data teams from quickly accessing data due to prolonged and tedious data access processes. 


In cases where data is accessible, the quality might not suffice for cutting edge use cases. Being able to maintain data privacy and its usefulness is not an easy task. With redacted data, it's common that the data quality doesn't allow for some forms of analysis anymore. 


In the end, these roadblocks are not only preventing companies from leveraging their data entirely, but they also cost a lot. 

The costs of data inertia for financial enterprises


Not being able to derive insights from data represents a cost for companies. The lack of agility to innovate can cost companies a competitive advantage. 


"For companies that are arming their workers with data today, 32 percent see a "significant increase" in product or service quality, while 28 percent see an increase in productivity or efficiency," Harvard Business Review report, Meet the New Decision Makers


On the other side of the coin, companies processing sensitive data without proper protection mechanisms expose themselves to financial, legal, and corporate risks. The re-identification and data leak risks of poor privacy mechanisms should not be ignored as they can lead to severe damage for a company. 


For instance, customer trust loss can represent a high cost for companies, although it's hard to quantify. But customers are less inclined to trust businesses with their money and confidential information after a breach. 

financial synthetic data

Companies that want to use their data for business intelligence and data science applications have the option to use privacy-preserving synthetic data.


Why organizations should use synthetic financial data to preserve privacy

 

In a general sense, synthetic data is artificially generated information instead of data collected from the real-world. Financial synthetic data mimics the statistical characteristics of the original dataset it's derived from. 


One of the advantages of this method is that the data utility can still be well preserved. Indeed, the synthetic financial data can still retain many of the properties and statistical information of the original data. With these underlying statistical patterns still present, it's possible to power almost any application intended for the original data. 


This synthetic financial data is also in effect an anonymization method. It safeguards the privacy of any personal data from the original dataset. If generated correctly, it won't have a one-to-one relationship with the original data, protecting the privacy of customers. It should not be possible to learn information about a particular individual from privacy-preserving synthetic data. It withdraws it from the scope of personal data processing regulations. 

financial synthetic data


A compliant and easy-to-access data asset


Privacy-preserving synthetic financial data is private by design. It's a guarantee for enterprises to remain compliant with personal data processing regulations, making it a crucial asset. For example, to comply with the GDPR data retention period, a bank would need to delete all personal and financial information after a customer contract ends, preventing any long term data analysis. With privacy-preserving synthetic data, the enterprise could run such long-term analysis on synthetic financial data generated during the contract period, and delete the customer information as required by any relevant regulation.


Compared to traditional privacy protection mechanisms, properly implemented synthetic data offers stronger privacy guarantees. Other methods, such as tokenization or pseudonymization, present re-identification risks that the use of synthetic data doesn’t. As previously mentioned, solely removing PII from the data is not a safe data protection mechanism and exposes business and individuals to privacy breaches through linkage attacks and other privacy-compromising exploits. 


Enterprises using synthetic financial data are also able to grow more agile with their data operations. Without the tedious governance and security processes that often prevent data from flowing within the organization, they can reduce their time-to-data.  You can read our full post on the evaluation of the ROI of synthetic data.


Supporting innovation with synthetic financial data


The changes in enterprises are not only a matter of applications but also infrastructures. Data architecture is evolving, and synthetic data might be one of the keys to building an infrastructure that supports innovation. 


The use of cloud storage, compute and other tools are, for instance, an important infrastructural shift. But how do you upgrade from on-premise to cloud when heavy governance and security processes regulate the transfer of customer data off-premise? Synthetic financial data offers an alternative to moving sensitive data out of your premises. You can make it available to your teams anywhere in the world without increasing compliance overhead, or risking security breaches.


Besides greater flexibility in data architecture, synthetic data opens the door to many risk-free applications, from machine learning training to BI or external data sharing with partners. Successful projects in the financial sectors already attest to that fact that synthetic financial data holds great potential for banks and other financial organizations.


The largest companies in the world are starting to work with synthetic data. Amazon is already using this technology to improve customer purchase prediction. American Express is also exploring the topic. The data teams are researching synthetic data to train machine learning and improve their fraud detection algorithms. We recently saw the Financial Conduct Authority launch a digital sandbox project to foster innovation among financial institutions. In this project, synthetic data offers an opportunity to build alternative payment datasets to improve scam and fraud detection models. 


Start evaluating privacy-preserving synthetic data today. Reach out to our team to learn how it can benefit your organization.


Wondering if this is for you?

Book A DEMO