Synthetic Health Data

Backstory: Test, Validate Early and Often Without Restrictions

In 2021, a large pharma company hired us for a Proof of Concept project because their test data creation was a manual, labor-intensive process. The resulting data was often too late and failed to represent common live data scenarios, thus challenging the effective development and Quality Control of their downstream data collection and transformation tools.

Our Solution

We automated the creation of synthesized test data reflective of the study design, and created test data without disrupting the critical path of data collection tool set-up.

This solution was able to create test data for entire patients across data domains representing common data scenarios in structures representative of live data extracted from these systems.

We were able to generate 02 types of data:

"Green Data" - compliant with standards and terminologies
"Red Data" - data purposefully created with errors to test the ability of client's software to detect those errors

In 2021 we presented a paper that describes our approach.

You can download a copy of the paper here

Metadata-driven Synthetic Data

Using our Data Standards Governor we're able to generate high-quality, realistic yet not real Synthetic Health Data.

Think of synthetic data as a software-based version of reality:

Artificially-generated people
Including full personal health record
Not based in real patient health data
No usage restrictions
No deanonymization risk
No HIPAA, security, copyright restrictions
Large amounts of data quickly available

In the case of clinical trials:

Synthetic People assigned to

Synthetic Trials to build

Synthetic Submissions

Using Synthetic Health Data clients can:

Test Comprehensively

✓ Test Early
✓ Test Often
✓ Test Fully

With Complete Confidence

✓ Safely
✓ Securely
✓ Privately
✓ Quickly
✓ Inexpensively
✓ Repeatedly
✓ Consistently