How Statice's synthetic data techniques differ from federated learning?

In times of home-office and handshake-bans, everyone is going online, and at Statice, we are joining in. Yesterday, we hosted our very first webinar, titled “How can businesses benefit from privacy-preserving synthetic data?”, and were thrilled by the participation across industries, as well as the great questions around privacy-preserving technologies and how Statice works! In case you missed out, you can find the slides on Slideshare here , and, below, we’ve summarized a couple of interesting questions participants had.

How do the techniques used by Statice differ from federated learning, and why did you choose your approach instead? ‍

Federated learning is an interesting approach that allows machine learning training without data sharing. It’s better suited to cases where data are distributed (such as mobile devices or smart cars). Statice works generally with large organizations who want to leverage centralized sensitive (big) data. At Statice, we chose to develop software to enable companies to generate privacy-preserving synthetic data, on premise, from their sensitive data. Our approach means that our technology can be the building block for privacy-preserving data science applications and processes, and is able to be flexibly integrated into existing enterprise systems.

‍

Can your anonymization models be broken by brute force? ‍

The Statice software generally runs at our clients on-premise, in secure environments (it has to access the sensitive data that needs to be anonymized, after all). Our models learn the statistical and structural characteristics of the sensitive data, and then are used to generate privacy-preserving synthetic data, which is what is usually then used or shared by the client. While the models learned by our software could in theory be used to gain insight into the original data, if you are leaking models that are in a secure on-premise environment, you might have greater problems in your infrastructure or security model, and also be leaking the sensitive data.

How can outliers in data sets be preserved in synthetic data while protecting privacy? ‍

This is a great question, and one which comes up very often when working with our clients. The truth is, in very small data sets with large outliers, it’s often very difficult to preserve privacy while providing good data utility. Generally though, Statice clients process large amounts of data, where outliers can be reflected in the synthetic data if they exist in the original in large enough quantities. The risk outliers in the synthetic data pose to privacy is evaluated automatically by the software and can be handled by tools built into the software directly.

‍

If you're interested in learning more about Statice, you can get in touch with us here. You can also sign up to access the on-demand recording using the link below.

‍

How Statice's synthetic data techniques differ from federated learning?

How do the techniques used by Statice differ from federated learning, and why did you choose your approach instead? ‍

Can your anonymization models be broken by brute force? ‍

How can outliers in data sets be preserved in synthetic data while protecting privacy? ‍

Get the latest content straight in your inbox!

Get the latest content straight in your inbox!

Articles you might like

How Statice's synthetic data techniques differ from federated learning?

How do the techniques used by Statice differ from federated learning, and why did you choose your approach instead? ‍

Can your anonymization models be broken by brute force? ‍

How can outliers in data sets be preserved in synthetic data while protecting privacy? ‍

Get the latest content straight in your inbox!

Get the latest content straight in your inbox!

Articles you might like

Get in touch.