9 facts about Statice's data anonymization software

September 9, 2020

Are you wondering if Statice has the right synthetic data solution for your needs? In this post, we discuss some of the advantages of working with our data anonymization software. From integration to evaluation, our data anonymization solution hopefully has everything to fit your team’s requirements. 

The idea and the technology behind the Statice data anonymization software

Data has become a valuable asset. For data-driven companies, the need to protect and use it responsibly is more significant than ever before.

The Statice data anonymization software enables companies in the healthcare, automotive, insurance, and finance sectors to use their data for applications previously out of reach. Whether these use-cases are machine learning training, data monetization, or external data sharing,  companies using synthetic data are safeguarding their data privacy.

The Statice software builds on differentially-private deep learning models to generate privacy-preserving synthetic data. The models learn statistical properties of original datasets and generate new synthetic data points with similar statistical utility. Privacy mechanisms guarantee the full anonymity and privacy-compliance of the synthetic data.

What does your data team need to know about Statice's solution?

Data science and machine learning teams are the primary users of the Statice data anonymization software. We built the tool with data teams in mind, ensuring that deploying and using the solution is straightforward.

Are you wondering if the Statice software matches your requirements? Here are ten facts about the software that will help you get a better understanding of it.  

1. You can integrate the Statice software in any modern infrastructure.

You deploy the SDK on-premise, in a private cloud, or your local infrastructure. You can also deploy on any major public cloud providers such as Google Cloud, AWS, or Azure, and data analytics platforms like Databricks or JupyterHub.  

2. You can get started immediately.

Deploying Statice software is not only simple, but it's also fast. It took less than two hours for one of our clients in the financial industry to install and run their first dataset synthesization. Besides, our team provides you with full support and extensive documentation. 

3. Your entire data team can use it.

The software comes with a programmatic interface and a command-line interface (CLI). Therefore, the Statice software is not dedicated only to developers and data scientists but also, through the CLI, to users with basic programming skills.

4. You can plug in the most commonly used data sources.

The software supports any data in tabular form: from .csv files to database exports (Postgres, MySQL, MongoDB). It is also possible to get custom data formats on request.

5. You can generate synthetic data from various types of structured data.

The software can generate synthetic data from most structured data types, including primitive types like categorical, continuous, or discrete. It can also input non-primitive types such as geolocation, temporal, DateTime, and transactional data types. 

6. You can work with large-scale datasets.

Statice software can handle large amounts of data. Users successfully processed datasets with tens of millions of entries and over 500 dimensions.

7. You can customize the synthetization process.

The software is highly customizable to fit your project's needs. You can manually extend the types of supported data and fine-tune the synthetization process by adjusting the parameters' values. You can also use table lookup to replace highly personally identifiable information (PII), which is removed from the data with user-provided "fake" information like (e.g., names).

8. You get evaluations to assess the utility of your synthetic data.

The software compares the conditional distributions, the pairwise dependencies, and the original dataset's marginal distributions to the synthetic dataset to ensure that utility is preserved. 

9. You get guarantees about the privacy of your synthetic data.

In addition to the Statice models being trained to satisfy differential privacy, several mechanisms ensure that your synthetic data is truly anonymous. For instance, the software simulates privacy attacks to ensure the generated synthetic data's full anonymity. 

We hope that this list has shown you that the Statice software would be a useful addition to your company's toolbox! We are continuously working to tailor our product to the needs of companies and professionals working with sensitive data. We'd be happy to hear about your projects and which requirements are the most important for your team 👉 contact our team.

Wondering if this is for you?