Vol. 1, Issue 1, Part A (2024)

Synthetic data generation for imbalanced clinical datasets via diffusion models

Author(s):

Shimul Islam and Sabrina Akhter

Abstract:

Clinical datasets are often imbalanced due to ethical, logistical, or pathological reasons, which hinders the training of robust machine learning models for diagnosis and prognosis. Synthetic data generation using advanced generative models has emerged as a viable solution to address class imbalance. This paper explores the application of diffusion models for generating high-quality synthetic clinical data, evaluates their effectiveness on multiple real-world datasets, and compares their performance with established generative adversarial networks (GANs) and variational autoencoders (VAEs). Empirical results demonstrate that diffusion models significantly improve the downstream classification performance and better preserve critical statistical properties of minority classes in clinical datasets.

Pages: 01-05  |  143 Views  85 Downloads

How to cite this article:
Shimul Islam and Sabrina Akhter. Synthetic data generation for imbalanced clinical datasets via diffusion models. J. Mach. Learn. Data Sci. Artif. Intell. 2024;1(1):01-05.