|
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
|
| Volume 187 - Issue 106 |
| Published: May 2026 |
| Authors: Srinivas Suresh Sikhakolli, Asha Kiran Sikhakolli |
10.5120/ijca372b529d7c61
|
Srinivas Suresh Sikhakolli, Asha Kiran Sikhakolli . Synthetic Medical Data Generation using Transformer-based Generative AI: A Performance Comparison with Faker and CTGAN. International Journal of Computer Applications. 187, 106 (May 2026), 22-26. DOI=10.5120/ijca372b529d7c61
@article{ 10.5120/ijca372b529d7c61,
author = { Srinivas Suresh Sikhakolli,Asha Kiran Sikhakolli },
title = { Synthetic Medical Data Generation using Transformer-based Generative AI: A Performance Comparison with Faker and CTGAN },
journal = { International Journal of Computer Applications },
year = { 2026 },
volume = { 187 },
number = { 106 },
pages = { 22-26 },
doi = { 10.5120/ijca372b529d7c61 },
publisher = { Foundation of Computer Science (FCS), NY, USA }
}
%0 Journal Article
%D 2026
%A Srinivas Suresh Sikhakolli
%A Asha Kiran Sikhakolli
%T Synthetic Medical Data Generation using Transformer-based Generative AI: A Performance Comparison with Faker and CTGAN%T
%J International Journal of Computer Applications
%V 187
%N 106
%P 22-26
%R 10.5120/ijca372b529d7c61
%I Foundation of Computer Science (FCS), NY, USA
Access to medical data is essential for health care research and advanced analytics. However, strict privacy regulations significantly limit data availability, hinder the machine learning applications. Due to these limitations, synthetic data usage raising across the world. Prior studies focused on building synthetic data using rule-based models such as Faker and deep learning models such as CTGAN. In recent years, ChatGPT, a transformer based Generative AI model has emerged with advanced capabilities to generate wide variety of synthetic data on demand. The aim of this research is to show that the transformer based generative AI model produces quality synthetic data that yields better predictive performance when compared with the Faker and CTGAN models. The synthetic data has been generated with reference to the UCI Cleveland Heart data. Random Forest algorithm has been used to evaluate the performance of the model. The results of the experiment prove that the transformer based GenAI, ChatGPT generated synthetic data yields better performance when compared with the Faker and CTGAN models. Also, proves that the performance metrics of ChatGPT based synthetic data are close to the actual Cleveland heart medical data. Our findings suggest that ChatGPT model effectively captured clinical relationships and offers practical insights for researchers without losing the privacy in synthetic data. This type experiment is useful for non-clinical research.