Vol. 1, Issue 1, Part A (2024)
Synthetic data generation for imbalanced clinical datasets via diffusion models
Shimul Islam and Sabrina Akhter
Clinical datasets are often imbalanced due to ethical, logistical, or pathological reasons, which hinders the training of robust machine learning models for diagnosis and prognosis. Synthetic data generation using advanced generative models has emerged as a viable solution to address class imbalance. This paper explores the application of diffusion models for generating high-quality synthetic clinical data, evaluates their effectiveness on multiple real-world datasets, and compares their performance with established generative adversarial networks (GANs) and variational autoencoders (VAEs). Empirical results demonstrate that diffusion models significantly improve the downstream classification performance and better preserve critical statistical properties of minority classes in clinical datasets.
Pages: 01-05 | 143 Views 85 Downloads