Algorithm for Assessing the Quality of Medical Synthetic Data Based on The Multi-Criteria and Pac-Bayesian Model

Juraev Gulomjon Primovich; Jovlieva Dilnoz Mustofa kizi

Vol. 6 No. 03 (2026)

Articles

Algorithm for Assessing the Quality of Medical Synthetic Data Based on The Multi-Criteria and Pac-Bayesian Model

pdf

Juraev Gulomjon Primovich,
Jovlieva Dilnoz Mustofa kizi

more info

Juraev Gulomjon Primovich
Associate Professor of the Department of "Exact Sciences, Land Cadastre and Municipal Services" of the International Innovation University, Doctor of Technical Sciences (PhD), Uzbekistan

Jovlieva Dilnoz Mustofa kizi
Lecturer of the Department of "Exact Sciences, Land Cadastre and Utilities" of the International Innovation University, Uzbekistan

Published 2026-03-31

Keywords

Synthetic data,
medical artificial intelligence,
multi-criteria assessment

How to Cite

Juraev Gulomjon Primovich, & Jovlieva Dilnoz Mustofa kizi. (2026). Algorithm for Assessing the Quality of Medical Synthetic Data Based on The Multi-Criteria and Pac-Bayesian Model. Stanford Database Library of American Journal of Applied Science and Technology, 6(03), 118–124. Retrieved from http://oscarpubhouse.com/index.php/sdlajast/article/view/1655

This work is licensed under a Creative Commons Attribution 4.0 International License.

Abstract

The effectiveness of medical artificial intelligence systems is directly related to the quality and level of representativeness of the training sample. However, real clinical data are characterized by problems such as confidentiality limitations, class mismatch, and heterogeneity. In solving these problems, synthetic data is considered a promising solution, however, a comprehensive assessment of their quality remains a pressing issue.

In this work, an algorithm for assessing the quality of synthetic data based on a multi-criteria approach and PAC-Bayesian theory is proposed. The proposed model combines statistical proximity (KL-divergence), inter-character relationships (mutual information), predictive efficiency, and subject-specific limitations within a single integrated functional.

Experimental results showed that the synthetic data satisfactorily reflect the main features of the real distribution (KL = 0.6035), and inter-characteristic relationships are maintained with high accuracy (MI = 0.0072). The model, trained on the basis of synthetic data, achieved an accuracy of 80.4%, which confirms the possibility of its practical application. The domain matching criterion showed the maximum value (1.0).

The results show that the proposed approach ensures a balance between the quality of synthetic data and model reliability.

pdf

References

I.Goodfellow, J.Pouget-Abadie, M.Mirza, B.Xu, D.Warde-Farley, S.Ozair, A.Courville and Y.Bengio. "Generative Adversarial Networks," in Advances in Neural Information Processing Systems (NeurIPS), 2014, pp. 2672-2680.
M.Arjovsky, S.Chintala and L.Bottou. "Wasserstein GAN," in Proceedings of the 34th International Conference on Machine Learning (ICML), 2017, pp. 214-223.
D.P.Kingma and M.Welling. "Auto-Encoding Variational Bayes," in Proceedings of the International Conference on Learning Representations (ICLR), 2014.
L.Hu, M.Skoularidou, A.Cuesta-Infante and K.Veeramachaneni. "Modeling Tabular Data using Conditional GAN," in Advances in Neural Information Processing Systems (NeurIPS) , 2019.
J.Yoon, J.Jordon and M.van der Schaar. "PATE-GAN: Generating Synthetic Data with Differential Privacy Guarantees," in International Conference on Learning Representations (ICLR) , 2019.
C. Dwork "Differential Privacy" in Proceedings of the 33rd International Colloquium on Automata, Languages and Programming (ICALP), 2006, pp. 1-12.
C.Dwork and A.Roth. The Algorithmic Foundations of Differential Privacy. Boston, MA, USA: Now Publishers.
R.Shokri, M.Stronati, C.Song and V.Shmatikov. "Membership Inference Attacks Against Machine Learning Models," in IEEE Symposium on Security and Privacy, 2017, pp. 3-18.
A.Esteban, S.L.Hyland and G.Rätsch. "Real-valued (Medical) Time Series Generation with Recurrent Conditional GANs," in Machine Learning for Healthcare Conference, 2017, pp. 276-296.
A.Borji. "Pros and Cons of GAN Evaluation Measures," Computer Vision and Image Understanding, vol. 179, pp. 41-65, 2019.
E. J.Topol, "High-performance medicine: the convergence of human and artificial intelligence," Natural Medicine, vol. 25, pp. 44-56, 2019.
R. Miotto, F. Wang, S. Wang, X. Jiang, and J. T. Dudley, "Deep learning for healthcare: review, opportunities and challenges," Briefings in Bioinformatics, vol. 19, no. 6, pp. 1236-1246, 2018.
R.Chen, J.Lu and Z.Chen, "Synthetic Data in Healthcare: A Survey," Journal of the American Medical Informatics Association, vol. 28, no. 11, pp. 2497-2508, 2021.
B.Jayaraman and D.Evans. "Evaluating Differentially Private Machine Learning in Practice," in USENIX Security Symposium, 2019, pp. 1895-1912.
A.El Emam, "Seven ways to evaluate the utility of synthetic data," IEEE Security & Privacy, vol. 18, no. 3, pp. 56-62, 2020.
UCI Machine Learning Repository, "Heart Disease Dataset." [Online]. Available: https://archive.ics.uci.edu/ml/datasets/heart+disease.

Algorithm for Assessing the Quality of Medical Synthetic Data Based on The Multi-Criteria and Pac-Bayesian Model

Keywords

How to Cite

Download Citation

Abstract

References