Synthetic Data: Safely Accelerating AI with Simulated Patients

As AI models hungry for data meet the realities of patient privacy, synthetic data has emerged as a practical solution. In essence, synthetic data consists of artificially generated records that statistically mirror real patient data without exposing individuals. For pharmacovigilance, this could mean creating realistic but artificial adverse event cases, electronic health records or even entire safety databases that preserve patterns from real data while removing personally identifiable information. The appeal is clear: organisations can train and test PV algorithms without breaching privacy or relying solely on scarce real-world data.

Read the full newsletter here

Imagine a pharma organisation developing an AI to flag drug-event signals. True safety signals, such as a drug causing liver injury, are rare and therefore difficult to train on. Synthetic data allows engineers to embed known signal scenarios into datasets to test whether AI can detect them. For example, patient profiles can be simulated where Drug X is linked to elevated liver enzymes within a larger dataset, challenging the model to identify the signal. Synthetic case narratives can also support literature screening by generating variations of adverse event descriptions, helping NLP models recognise different ways side effects are reported.

Additionally, regulators and industry are exploring “digital twins” of safety databases — synthetic versions of resources like FAERS or VigiBase — where new methods can be tested without risking data exposure.

The concept remains nascent in pharmacovigilance, with limited published evidence. A 2024 review noted that synthetic data must meet high “fidelity” standards, preserving real statistical relationships between drugs and events while maintaining realistic variation in patient populations. Poorly constructed datasets risk misleading models, for example by distorting incidence rates. In response, industry groups are developing best practices and validation criteria. Regulators are also monitoring progress; while fully synthetic trial arms are not yet accepted, exploratory uses such as simulation studies or supporting real-world evidence are gaining traction.

In the coming years, synthetic data could become a valuable sandbox for pharmacovigilance innovation — supporting AI development, testing safety scenarios and enabling collaboration without compromising patient confidentiality. By combining realism with privacy, it offers a way to accelerate pharmacovigilance analytics where data access often limits progress.

“Synthetic data are increasingly seen as a transformative solution to the data gaps and privacy constraints in pharmaceutical research, including pharmacovigilance and clinical development.”
Adrien Laurent, CEO of IntuitionLabs

Sign up for regular AI newsletter