Breaking the Homogeneity Assumption: Specialized Multi-Generator Adversarial Learning for Rare Failure Detection in Predictive Maintenance

Alexis Lazanas; Georgios Kampouropoulos

Research Article

Breaking the Homogeneity Assumption: Specialized Multi-Generator Adversarial Learning for Rare Failure Detection in Predictive Maintenance

by Alexis Lazanas, Georgios Kampouropoulos

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 187 - Issue 97

Published: April 2026

Authors: Alexis Lazanas, Georgios Kampouropoulos

10.5120/ijca9c7d217bd514

PDF

Alexis Lazanas, Georgios Kampouropoulos . Breaking the Homogeneity Assumption: Specialized Multi-Generator Adversarial Learning for Rare Failure Detection in Predictive Maintenance. International Journal of Computer Applications. 187, 97 (April 2026), 25-37. DOI=10.5120/ijca9c7d217bd514

                        @article{ 10.5120/ijca9c7d217bd514,
                        author  = { Alexis Lazanas,Georgios Kampouropoulos },
                        title   = { Breaking the Homogeneity Assumption: Specialized Multi-Generator Adversarial Learning for Rare Failure Detection in Predictive Maintenance },
                        journal = { International Journal of Computer Applications },
                        year    = { 2026 },
                        volume  = { 187 },
                        number  = { 97 },
                        pages   = { 25-37 },
                        doi     = { 10.5120/ijca9c7d217bd514 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }

                        %0 Journal Article
                        %D 2026
                        %A Alexis Lazanas
                        %A Georgios Kampouropoulos
                        %T Breaking the Homogeneity Assumption: Specialized Multi-Generator Adversarial Learning for Rare Failure Detection in Predictive Maintenance%T 
                        %J International Journal of Computer Applications
                        %V 187
                        %N 97
                        %P 25-37
                        %R 10.5120/ijca9c7d217bd514
                        %I Foundation of Computer Science (FCS), NY, USA

Abstract

Supervised learning models in the predictive maintenance field are regularly trained on industrial datasets which are highly imbalanced: machine failures occur rarely but have a disproportionate effect on operations. In addition to the clear class disparity, the data of failures are typically non-homogeneous, with the different modes of failure being based on different physical processes and having a multimodal distribution among the minorities and the classes. Traditional imbalance management methods e.g. undersampling, SMOTE based interpolation or cost sensitive learning, typically assume that the minority population is a homogeneous, homogenous group. This means that their effectiveness is severely limited in multifaceted conditions that are experienced in industrial practice. This paper determines the possibility of a failure type conscious generative augmentation program to improve the identification of infrequent failures in predictive maintenance systems. An experimental design that is leakage safe is used to compare five imbalance handling methods: cost sensitive learning, random undersampling, SMOTE oversampling, single generator GAN augmentation, and a specialized multi-generator GAN architecture that has independent generators that are asked to learn individual failure subtypes. Precision/Recall-oriented measures are used to quantify model performance, the main evaluation measure is the PR-AUC. Experiments carried out on the AI4I 2020 predictive maintenance dataset indicate that the suggested multi-generator GAN framework produces more realistic samples of minorities, thus producing better PR-AUC and recall scores in comparison to traditional resampling methods and individual generator GAN augmentation. Though this method comes at the cost of higher computational costs, the findings provide strong evidence that generator specialization is a more efficient method to cope with the heterogeneous distributions of failure, that are inherent to the imbalanced predictive maintenance cases.

References

M. Achouch, M. Dimitrova, K. Ziane et al., “On predic-tive maintenance in Industry 4.0: Overview, models, and challenges,” Applied Sciences, vol. 12, no. 16, p. 8081, 2022. https://doi.org/10.3390/app12168081
I. N. M. Adiputra, P. Lin, and P. Wanchai, “The effective-ness of generative adversarial network-based over-sampling methods for imbalanced multi-class credit score classification,” Electronics, vol. 14, no. 4, p. 697, 2025. https://doi.org/10.3390/electronics14040697
AI4I, “AI4I 2020 predictive maintenance dataset,” UCI Machine Learning Repository, 2020. https://archive.ics.uci.edu/ml/datasets/AI4I+2020+Predictive+Maintenance+Dataset
R. Akbani, S. Kwek, and N. Japkowicz, “Applying sup-port vector machines to imbalanced datasets,” in Proceed-ings of the 15th European Conference on Machine Learn-ing (ECML), Lecture Notes in Computer Science, vol. 3201, Springer, 2004, pp. 39–50. https://doi.org/10.1007/978-3-540-30115-8_8
T. Akiba, S. Sano, T. Yanase, T. Ohta, and M. Koyama, “Optuna: A next-generation hyperparameter optimization framework,” in Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), ACM, 2019, pp. 2623–2631. https://doi.org/10.1145/3292500.3330701
M. Altalhan, A. Algarni, and M. T.-H. Alouane, “Imbal-anced data problem in machine learning: A review,” IEEE Access, vol. 13, pp. 13686–13699, 2025. https://doi.org/10.1109/ACCESS.2025.3531662
M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein GAN,” in Proceedings of the 34th International Confer-ence on Machine Learning (ICML), 2017. https://arxiv.org/abs/1701.07875
I. Ashrapov, “Tabular GANs for uneven distribution,” arXiv preprint arXiv:2010.00638, 2020. https://doi.org/10.48550/arXiv.2010.00638
A. Atere and H. Kivrak, “Addressing data imbalance in predictive maintenance using SMOTE, SMOTE-Tomek, and GANs: A comparative evaluation,” in Proceedings of the International Symposium on Applied Data Engineer-ing and Sciences (ISRDES), Tokat, Türkiye, 2025.
G. E. P. Box and D. R. Cox, “An analysis of transfor-mations,” Journal of the Royal Statistical Society: Series B (Methodological), vol. 26, no. 2, pp. 211–243, 1964. https://doi.org/10.1111/j.2517-6161.1964.tb00553.x
L. Breiman, “Random forests,” Machine Learning, vol. 45, no. 1, pp. 5–32, 2001. https://doi.org/10.1023/A:1010933404324
M. Carvalho, A. J. Pinho, and S. Brás, “Resampling ap-proaches to handle class imbalance: A review from a data perspective,” Journal of Big Data, vol. 12, no. 1, p. 71, 2025. https://doi.org/10.1186/s40537-025-01119-4
N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Keg-elmeyer, “SMOTE: Synthetic minority over-sampling technique,” Journal of Artificial Intelligence Research, vol. 16, pp. 321–357, 2002. https://doi.org/10.1613/jair.953
J. E. Choi, D. H. Seol, C. Y. Kim, and S. J. Hong, “Gen-erative adversarial network-based fault detection in semi-conductor equipment with class-imbalanced data,” Sen-sors, vol. 23, no. 4, p. 1889, 2023. https://doi.org/10.3390/s23041889
C. Cortes and V. Vapnik, “Support-vector networks,” Machine Learning, vol. 20, no. 3, pp. 273–297, 1995. https://doi.org/10.1007/BF00994018
J. Davis and M. Goadrich, “The relationship between precision–recall and ROC curves,” in Proceedings of the 23rd International Conference on Machine Learning (ICML), ACM, 2006, pp. 233–240. https://doi.org/10.1145/1143844.1143874
V. W. de Vargas, J. A. S. Aranda, R. dos Santos Costa, P. R. da Silva Pereira, and J. L. V. Barbosa, “Imbalanced data preprocessing techniques for machine learning: A systematic mapping study,” Knowledge and Information Systems, vol. 65, no. 1, pp. 31–57, 2023. https://doi.org/10.1007/s10115-022-01772-8
A. Demircioğlu, “Applying oversampling before cross-validation will lead to high bias in radiomics,” Scientific Reports, vol. 14, p. 11563, 2024. https://doi.org/10.1038/s41598-024-62585-z
G. Douzas and F. Bacao, “Effective data generation for imbalanced learning using conditional generative adversar-ial networks,” Expert Systems with Applications, vol. 91, pp. 464–471, 2018. https://doi.org/10.1016/j.eswa.2017.09.030
C. Drummond and R. C. Holte, “C4.5, class imbalance, and cost sensitivity: Why under-sampling beats over-sampling,” in Proceedings of the Workshop on Learning from Imbalanced Data Sets II (ICML 2003), Washington, DC, USA, 2003.
C. Elkan, “The foundations of cost-sensitive learning,” in Proceedings of the Seventeenth International Joint Con-ference on Artificial Intelligence (IJCAI), 2001, pp. 973–978.
T. Emmanuel, T. Maupong, D. Mpoeleng, T. Semong, B. Mphago, and O. Tabona, “A survey on missing data in machine learning,” Journal of Big Data, vol. 8, no. 1, p. 140, 2021. https://doi.org/10.1186/s40537-021-00516-9
G. Eom and H. Byeon, “Searching for optimal over-sampling to process imbalanced data: Generative adversar-ial networks and synthetic minority over-sampling tech-nique,” Mathematics, vol. 11, no. 16, p. 3605, 2023. https://doi.org/10.3390/math11163605
T. Fawcett, “An introduction to ROC analysis,” Pattern Recognition Letters, vol. 27, pp. 861–874, 2006.
I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. Courville, “Improved training of Wasserstein GANs,” in Advances in Neural Information Processing Systems, vol. 30, 2017.
A. Hakami, “Strategies for overcoming data scarcity, im-balance, and feature selection challenges in machine learn-ing models for predictive maintenance,” Scientific Reports, vol. 14, p. 9645, 2024. https://doi.org/10.1038/s41598-024-59958-9
H. He and E. A. Garcia, “Learning from imbalanced data,” IEEE Transactions on Knowledge and Data Engineering, vol. 21, no. 9, pp. 1263–1284, 2009. https://doi.org/10.1109/TKDE.2008.239
M. Hermans, M. Kozielski, M. Michalak et al., “Sensor-based predictive maintenance with reduction of false alarms—A case study in heavy industry,” Sensors, vol. 22, no. 1, p. 226, 2022. https://doi.org/10.3390/s22010226
Q. Hoang, T. D. Nguyen, T. Le, and D. Phung, “MGAN: Training generative adversarial nets with multiple genera-tors,” in Proceedings of the International Conference on Learning Representations (ICLR), 2018.
E. Jang, S. Gu, and B. Poole, “Categorical reparameteriza-tion with Gumbel-Softmax,” in Proceedings of the Inter-national Conference on Learning Representations (ICLR), 2017.
S. Kaufman, S. Rosset, and C. Perlich, “Leakage in data mining: Formulation, detection, and avoidance,” in Pro-ceedings of the 18th ACM SIGKDD International Confer-ence on Knowledge Discovery and Data Mining, ACM, 2012, pp. 556–563.
M. Kubat and S. Matwin, “Addressing the curse of imbal-anced training sets: One-sided selection,” in Proceedings of the 14th International Conference on Machine Learn-ing (ICML), 1997, pp. 179–186.
Y. Mahale, S. Kolhar, and A. S. More, “Enhancing pre-dictive maintenance in the automotive industry: Address-ing class imbalance using advanced machine learning techniques,” Discover Applied Sciences, vol. 7, no. 1, p. 340, 2025. https://doi.org/10.1007/s42452-025-06827-3
S. Matzka, “Explainable artificial intelligence for predictive maintenance applications,” in Proceedings of the Third In-ternational Conference on Artificial Intelligence for Indus-tries (AI4I), IEEE, 2020, pp. 69–74.
L. Meitz, J. Senge, T. Wagenhals, and T. Bauernhansl, “A literature review framework and open research challenges for predictive maintenance in Industry 4.0,” Computers & Industrial Engineering, vol. 206, p. 111193, 2025.
M. Moléda, B. B. Małysiak-Mrozek, W. Ding, V. Sun-deram, and D. Mrozek, “From corrective to predictive maintenance: A review of maintenance approaches for the power industry,” Sensors, vol. 23, no. 13, p. 5970, 2023.
J. M. H. Pinheiro, S. V. B. de Oliveira, T. H. S. Silva et al., “The impact of feature scaling in machine learning: Ef-fects on regression and classification tasks,” arXiv pre-print arXiv:2506.08274, 2025.
T. Saito and M. Rehmsmeier, “The precision–recall plot is more informative than the ROC plot when evaluating bina-ry classifiers on imbalanced datasets,” PLOS ONE, vol. 10, no. 3, p. e0118432, 2015.
M. Sokolova and G. Lapalme, “A systematic analysis of performance measures for classification tasks,” Infor-mation Processing & Management, vol. 45, no. 4, pp. 427–437, 2009.
I. Tomek, “Two modifications of CNN,” IEEE Transac-tions on Systems, Man, and Cybernetics, vol. 6, no. 11, pp. 769–772, 1976.
D. L. Wilson, “Asymptotic properties of nearest neighbor rules using edited data,” IEEE Transactions on Systems, Man, and Cybernetics, vol. SMC-2, no. 3, pp. 408–421, 1972.
L. Xu, M. Skoularidou, A. Cuesta-Infante, and K. Veera-machaneni, “Modeling tabular data using conditional GAN,” in Advances in Neural Information Processing Systems, vol. 32, 2019.
Y. Yang and M. Z. Iqbal, “Cost-optimised machine learn-ing model comparison for predictive maintenance,” Elec-tronics, vol. 14, no. 12, p. 2497, 2025.
I.-K. Yeo and R. A. Johnson, “A new family of power transformations to improve normality or symmetry,” Bio-metrika, vol. 87, no. 4, pp. 954–959, 2000. https://doi.org/10.1093/biomet/87.4.954
I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Proceedings of the 27th International Conference on Neural Information Pro-cessing Systems (NeurIPS), vol. 2, MIT Press, 2014, pp. 2672–2680. DOI: https://doi.org/10.48550/arXiv.1406.2661
J. Lahnakoski, J. Salmi, and S. M. Laaksonen, “Avoiding data leakage in machine learning pipelines: A systematic evaluation of preprocessing strategies,” IEEE Access, vol. 11, pp. 12463–12477, 2023. DOI: https://doi.org/10.1109/ACCESS.2023.3240784
F. Pargent, T. J. Schoenbrodt, and M. Gollwitzer, “Best practices in machine learning for psychology: A tutorial for building predictive models,” Advances in Methods and Practices in Psychological Science, vol. 5, no. 2, 2022. DOI: https://doi.org/10.1177/25152459211036710
G. C. Cawley and N. L. C. Talbot, “On over-fitting in model selection and subsequent selection bias in perfor-mance evaluation,” Journal of Machine Learning Re-search, vol. 11, pp. 2079–2107, 2010. DOI: https://doi.org/10.5555/1756006.1859921
A. Lazanas, S. Christodoulou and S. Karpouzis, “Context-Integrated Adversarial Learning for Predictive Modelling of Stock Price Dynamics”, International Journal of Engi-neering Research & Technology, vol. Volume 15, no. 02, 2026, DOI: https://doi.org/10.5281/zenodo.18874420
S. Dedotsi, A. Lazanas, I. Siachos, D. Teloni and A. G. Telonis, “Discrete clusters formulation through the exploi-tation of optimized k-modes algorithm for hypotheses val-idation in social work research: the case of Greek social workers working with refugees”, BOHR International Journal of Internet of Things, Artificial Intelligence and Machine Learning, vol 2, no. 1, pp. 11–18, 2023, https://doi.org/10.54646/bijiam.2023.12

Index Terms

Computer Science

Information Sciences

No index terms available.

Keywords

Predictive maintenance Generative adversarial networks (GAN) Imbalanced learning Synthetic data generation