Synthetic Test Data: High-value use cases in medical product development

Innovation can be a driving force for quality improvement — synthetic data, robust analytics, and AI-powered testing tools can uncover answers at a faster pace than today‘s cumbersome, resource-heavy, disjointed test data management processes. If data were better leveraged, streamlined, and made accessible, medical companies would be able to accelerate innovation and enhance customer experience from high-quality medical products or applications.

Synthetic data is derived from original patient information collected from actual patient populations, provides information about patients’ health statuses and healthcare delivery. Based on data routinely collected from sources such as electronic health records (EHRs), claims and billing activities, and product and disease registries, a completely anonymized, brand new parallel of insights is created. Synthetic data cannot be traced back to individuals in the original patient population like other forms of data (such as de-identified, etc.). Ensuring privacy is crucial to prevent re-identification of individuals or exposure of confidential information. Synthetic data handles the privacy in multiple levels such as data anonymization, data masking, data encryption, differential privacy etc.

Synthetic data in medical imaging offers numerous benefits, including the ability to augment datasets with diverse and realistic images where real data is limited. This reduces the costs and labour associated with annotating real images. Synthetic data also provides an ethical alternative to using sensitive patient data, which helps with education and training without compromising patient privacy.

Synthetic test datasets can be incredibly diverse, encompassing various types of data that reflect different aspects of patient care and medical research. Here are some examples.

- Patient data : simulated patient records, demographics, vital signs, and medical histories
- medical images, lab results, and treatment outcomes
- claims data, billing information, and schedule/operational data
- Device data : generated device signals, sensor readings, and log data
- Usage data : mimicked device usage patterns and user interactions
- Environmental data : simulated environmental conditions (temperature, humidity, etc.)

These synthetic datasets are designed to maintain the statistical properties and patterns of real data, ensuring they are useful for product development, research and training purposes while protecting patient privacy. For example, a synthetic dataset might include a patient’s journey through a hospital stay, complete with admissions, diagnoses, interventions, and discharge summaries, all fabricated but realistic enough to train machine learning models effectively.

Based on my 20+ years in the trenches for medical devices and healthcare companies, I see that there is an important need to speed up the product development process for medical companies. Complex processes, fragmented data sets, complicated protocols, and long timelines hinder organizations from innovating medical products. The versatility of synthetic patient data makes it a potentially valuable resource for furthering medical product development, while effectively addressing privacy concerns.

Synthetic test data can be incredibly useful in medical product development, but it is important to know where it can and cannot be used effectively. Here are some examples:

Where synthetic test data can be used by medical product companies:

Improve performance and device reliability: Medical devices and software need extensive testing to ensure performance, safety, and effectiveness. Synthetic test data supports these simulations by providing realistic scenarios, failure modes, and edge cases, thus enhancing device performance and fault tolerance.
Testing rare medical conditions: Generates realistic patient data for testing to supplement limited real-world data and helps evaluate human factors, user interaction, and device usability.
Imaging and diagnostics: Synthetic data generates realistic medical images, such as MRI scans and X-rays, providing the necessary diversity and volume for algorithm training and development without compromising patient information.
Testing algorithms: Algorithms can be tested on synthetic data to ensure they work correctly before being applied to real patient data, helping identify and fix potential issues early in the development process.
Simulating clinical trials: Synthetic data can simulate patient responses to new treatments, allowing researchers to predict outcomes and refine their approaches before conducting actual clinical trials.
Educational purposes and collaboration: Medical students and professionals can use synthetic data to practice diagnosis and treatment planning without accessing sensitive patient information, enabling seamless data sharing while protecting patient privacy and ensuring compliance.

Where synthetic test data should not be used:

Clinical decision-making: While synthetic data can be useful for training and testing, it should not be used for making actual clinical decisions. Real patient data is necessary to ensure accurate and reliable diagnoses and treatment plans.
Patient-specific analysis: Synthetic data does not correspond to real individuals, so it cannot be used for personalized medicine or patient-specific analysis. Real patient data is essential for tailoring treatments to individual needs.

Few examples where synthetic test data was utilized by medical product companies:

Insulin pump developer: generated synthetic data to validate device dosage algorithms.
Ultrasound manufacturing company improved image analysis accuracy and reduced image processing time through synthetic ultrasound images
An electrocardiogram (ECG) device manufacturer used synthetic test data to tackle noisy sensor data, improving device accuracy and reliability
Pacemaker manufacturer used synthetic data to test device performance in rare cardiac conditions and to ensure data anonymity
Stent manufacturer employs synthetic data for stent design and simulation
Orthopaedic device company used synthetic data (such as medical images, patient history, etc.) for orthopaedic device testing to enhance performance and safety
Medical imaging company uses synthetic data while testing image analysis algorithms and for AI model training
Point of care device manufacturer improves the device accuracy and reduced false positives by integrating synthetic patient data

I see that synthetic data is getting accepted in the medical product development processes and expected to grow in coming years. The key demand drivers are:

- Increased adoption : widespread integration of synthetic data for testing and design optimization of medical device and software development
- Advanced algorithms : development of sophisticated algorithms using synthetic data
- Regulatory framework : establishment of clear guidelines for the use of synthetic data

The field of synthetic data and AI-powered testing tools is rapidly evolving, with continuous improvements in model accuracy, scalability, and ease of use. Future advancements may include more sophisticated models capable of generating multimodal data, which combines text, images, and numerical data into cohesive synthetic datasets. By leveraging synthetic test data, medical product companies can benefit in enhanced performance/reliability, increased test coverage, and faster time to market.

For further reading, please refer to:

Synthetic Test Data: High-value use cases in medical product development

Quick Links