# Synthetic Data Could Be Better Than Real Data

## About
- Author: Neil Savage
- Title: Synthetic Data Could Be Better Than Real Data
- Tags: #articles
- URL: https://www.nature.com/articles/d41586-023-01445-8?utm_source=facebook&utm_medium=paid_social&utm_campaign=CONR_OUTLK_CFUL_GL_PCFU_CFULF_AIRBOT-523&fbclid=IwAR3XHXppf3trUo84_DExnAn1XSnSnYQmCpnZbeeD6dfWeYaMq_fIkotmMzU_aem_AXautywnxSuaJUSCaoFT4qPzxwydWr7rca6YjF2jibjeLWCW5AfpdlJvVO-gpx3gILValIQHCSwIpXCgWnIn3NQTCSi8BfdfjtL64MiIRnJrH28SFOKXkg8CRQtVJHNqhMM
## Highlights
There are also fundamental theoretical limits to how much improvement data can undergo, says Isola. Information theory contains a principle called the data-processing inequality, which states that processing data can only reduce the amount of information available, not add to it[4](https://www.nature.com/articles/d41586-023-01445-8#ref-CR4). And all synthetic data must have real data at its root, so all the problems with real data — privacy, bias, expense and more — still exist at the start of the pipeline. “You’re not getting something for free — you’re still ultimately learning from the world, from data. You’re just reformatting that into an easier-to-work-with format that you can control better,” Isola says. With synthetic data, “data comes in and a better version of the data comes out”.
---
“Data privacy is so important in the age of surveillance capitalism,” he says. Creating good synthetic data that both preserve privacy and reflect diversity, and that are made widely available, has the potential not just to improve the performance of AI and expand its uses, but also to help democratize AI research.
---