Skip to main content

Synthetic Health Data

Backstory: Test, Validate Early and Often Without Restrictions

In 2021, a large pharma company hired us for a Proof of Concept project because their test data creation was a manual, labor-intensive process. The resulting data was often too late and failed to represent common live data scenarios, thus challenging the effective development and Quality Control of their downstream data collection and transformation tools.

Our Solution

We automated the creation of synthesized test data reflective of the study design, and created test data without disrupting the critical path of data collection tool set-up.

This solution was able to create test data for entire patients across data domains representing common data scenarios in structures representative of live data extracted from these systems.

We were able to generate 02 types of data:

  • "Green Data" - compliant with standards and terminologies
  • "Red Data" - data purposefully created with errors to test the ability of client's software to detect those errors

In 2021 we presented a paper that describes our approach.

You can download a copy of the paper here

Metadata-driven Synthetic Data

Using our Data Standards Governor we're able to generate high-quality, realistic yet not real Synthetic Health Data.

Think of synthetic data as a software-based version of reality:

  • Artificially-generated people
  • Including full personal health record
  • Not based in real patient health data
  • No usage restrictions
  • No deanonymization risk
  • No HIPAA, security, copyright restrictions
  • Large amounts of data quickly available

In the case of clinical trials:

Synthetic People assigned to

Synthetic Trials to build

Synthetic Submissions

Using Synthetic Health Data clients can:

Test Comprehensively

  • Test Early
  • Test Often
  • Test Fully

With Complete Confidence

  • Safely
  • Securely
  • Privately
  • Quickly
  • Inexpensively
  • Repeatedly
  • Consistently

Synthetic Health Data can be used as:

Robust, high-quality test data created to accurately represents expected study data

Test data created in a dynamic, automated way based on the study design (Protocol) and eDC configurations (Medidata Rave ALS)

Test data to represent all anticipated patient journeys through the study

Test data representing all anticipated data that could be collected (representative of all eDC data collection dynamics as defined in the ALS)

The Synthetic Health Data includes these data domains:

  • Subject ID form
  • Demography
  • Disposition (for each study phase inc. IC, Screening, randomization, end treatment/data collection/study )
  • Adverse Events
  • Concomitant Medications
  • Efficacy data (example)
  • Visit based data collection (vitals)
  • Date of visit

Here are some details of the generated data:

In this diagram you'll notice that the data shows:

  • Realistic yet fake full name
  • User-defined age range, and gender, race %.
  • Real city / town. With Latitude, Longitude for mapping.

Contact us

Please contact us for more details.