GAN-Based Novel Approach for Generating Synthetic Medical Tabular Data

Bioengineering (Basel). 2024 Dec 18;11(12):1288. doi: 10.3390/bioengineering11121288.

Abstract

The generation of synthetic medical data has become a focal point for researchers, driven by the increasing demand for privacy-preserving solutions. While existing generative methods heavily rely on real datasets for training, access to such data is often restricted. In contrast, statistical information about these datasets is more readily available, yet current methods struggle to generate tabular data solely from statistical inputs. This study addresses the gaps by introducing a novel approach that converts statistical data into tabular datasets using a modified Generative Adversarial Network (GAN) architecture. A custom loss function was incorporated into the training process to enhance the quality of the generated data. The proposed method is evaluated using fidelity and utility metrics, achieving "Good" similarity and "Excellent" utility scores. While the generated data may not fully replace real databases, it demonstrates satisfactory performance for training machine-learning algorithms. This work provides a promising solution for synthetic data generation when real datasets are inaccessible, with potential applications in medical data privacy and beyond.

Keywords: GAN; custom loss function; statistical data; synthetic medical tabular data.