Research Article

A Bio-Inspired Earthworm Optimization Algorithm Combined with PCA for Improved Feature Selection in Machine Learning Models

by  Amit Kumar Saxena, Damodar Patel, Umesh Kumar Shriwas, Abhishek Dubey, Gayatri Sahu, Shreya Chinde
journal cover
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 187 - Issue 36
Published: September 2025
Authors: Amit Kumar Saxena, Damodar Patel, Umesh Kumar Shriwas, Abhishek Dubey, Gayatri Sahu, Shreya Chinde
10.5120/ijca2025925620
PDF

Amit Kumar Saxena, Damodar Patel, Umesh Kumar Shriwas, Abhishek Dubey, Gayatri Sahu, Shreya Chinde . A Bio-Inspired Earthworm Optimization Algorithm Combined with PCA for Improved Feature Selection in Machine Learning Models. International Journal of Computer Applications. 187, 36 (September 2025), 43-54. DOI=10.5120/ijca2025925620

                        @article{ 10.5120/ijca2025925620,
                        author  = { Amit Kumar Saxena,Damodar Patel,Umesh Kumar Shriwas,Abhishek Dubey,Gayatri Sahu,Shreya Chinde },
                        title   = { A Bio-Inspired Earthworm Optimization Algorithm Combined with PCA for Improved Feature Selection in Machine Learning Models },
                        journal = { International Journal of Computer Applications },
                        year    = { 2025 },
                        volume  = { 187 },
                        number  = { 36 },
                        pages   = { 43-54 },
                        doi     = { 10.5120/ijca2025925620 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }
                        %0 Journal Article
                        %D 2025
                        %A Amit Kumar Saxena
                        %A Damodar Patel
                        %A Umesh Kumar Shriwas
                        %A Abhishek Dubey
                        %A Gayatri Sahu
                        %A Shreya Chinde
                        %T A Bio-Inspired Earthworm Optimization Algorithm Combined with PCA for Improved Feature Selection in Machine Learning Models%T 
                        %J International Journal of Computer Applications
                        %V 187
                        %N 36
                        %P 43-54
                        %R 10.5120/ijca2025925620
                        %I Foundation of Computer Science (FCS), NY, USA
Abstract

High-dimensional data often leads to increased computational complexity and reduced model performance due to the curse of dimensionality. This study introduces an effective feature selection and classification framework that integrates the Earthworm Optimization Algorithm (EWA), Principal Component Analysis (PCA), and supervised classifiers, K Nearest Neighbors (KNN) and Support Vector Machine (SVM). EWA, a bio-inspired metaheuristic based on the foraging behavior of earthworms, efficiently identifies optimal feature subsets. PCA is then applied to further minimize dimensionality while preserving essential variance. The proposed EWA-PCA was evaluated on 19 benchmark datasets using stratified 10-fold cross-validation and standard classification metrics. In the KNN average accuracy of 19 datasets, using the original feature set achieved 77.65% of accuracy, while the EWA-PCA achieved better 86.56%; similarly, in SVM, 84.43% of accuracy was achieved in the original feature, while the EWA-PCA achieved 88.10%. Results show that EWA-PCA consistently outperforms conventional and modern feature selection techniques, including Chi2, ReliefF, SIFS, mRMR, ATFS, and EmPo. EWA-PCA achieved better classification accuracies with KNN and SVM, demonstrating high stability and substantial feature reduction. The findings validate EWA-PCA as a scalable, accurate, and efficient solution for high-dimensional data classification.

References
  • Saxena, A. K., Dubey, V. K., and Wang, J. 2017. Hybrid feature selection methods for high-dimensional multi-class datasets. International Journal of Data Mining, Modelling and Management, 9(4), 315-339.
  • Goodfellow, I., Bengio, Y., Courville, A., and Bengio, Y. 2016. Deep learning. 1(2).
  • Theng, D., and Bhoyar, K. K. 2024. Feature selection techniques for machine learning: a survey of more than two decades of research. Knowledge and Information Systems, 66(3), 1575-1637.
  • Lin, K. L., Lin, C. Y., Huang, C. D., Chang, H. M., Yang, C. Y., Lin, C. T., ... and Hsu, D. F. 2007. Feature selection and combination criteria for improving accuracy in protein structure prediction. IEEE Transactions on Nanobioscience, 6(2), 186-196.
  • Saxena, A., Kothari, M., and Pandey, N. 2009. Evolutionary approach to dimensionality reduction. In Encyclopedia of Data Warehousing and Mining, Second Edition. 810-816.
  • Ruano-Ordás, D. 2024. Machine learning-based feature extraction and selection. Applied Sciences, 14(15), 6567.
  • Rahmat, F., Zulkafli, Z., Ishak, A. J., Abdul Rahman, R. Z., Stercke, S. D., Buytaert, W., ... and Ismail, M. 2024. Supervised feature selection using principal component analysis. Knowledge and Information Systems, 66(3), 1955-1995.
  • Sánchez-Maroño, N., Alonso-Betanzos, A., and Tombilla-Sanromán, M. 2007. Filter methods for feature selection–a comparative study. In International conference on intelligent data engineering and automated learning. 178-187
  • Patel, D., Saxena, A. K., Laha, S., and Ansari, G. M. 2022. A novel scheme for feature selection using filter approach. In 2022 7th International Conference on Computing, Communication and Security (ICCCS). 1-4.
  • Dubey, V. K., and Saxena, A. K. 2016. Cosine similarity based filter technique for feature selection. In 2016 International Conference on Control, Computing, Communication and Materials (ICCCCM). 1-6.
  • Batina, L., Gierlichs, B., Prouff, E., Rivain, M., Standaert, F. X., and Veyrat-Charvillon, N. 2011. Mutual information analysis: a comprehensive study. Journal of Cryptology, 24(2), 269-291.
  • Patel, D., Saxena, A., and Wang, J. 2024. A machine learning-based wrapper method for feature selection. International Journal of Data Warehousing and Mining (IJDWM), 20(1), 1-33.
  • Liu, H., Zhou, M., and Liu, Q. 2019. An embedded feature selection method for imbalanced data classification. IEEE/CAA Journal of Automatica Sinica, 6(3), 703-715.
  • Saxena, A. K., and Dubey, V. K. 2015. A survey on feature selection algorithms. International Journal on Recent and Innovation Trends in Computing and Communication, 3(4), 1895-1899.
  • Wang, G. G., Deb, S., and Coelho, L. D. S. 2018. Earthworm optimisation algorithm: a bio-inspired metaheuristic algorithm for global optimisation problems. International journal of bio-inspired computation, 12(1), 1-22.
  • Robnik-Šikonja, M., and Kononenko, I. 2003. Theoretical and empirical analysis of ReliefF and RReliefF. Machine learning, 53(1), 23-69.
  • Ding, C., and Peng, H. 2005. Minimum redundancy feature selection from microarray gene expression data. Journal of bioinformatics and computational biology, 3(02), 185-205.
  • Eskandari, S., and Akbas, E. 2017. Supervised infinite feature selection. arXiv preprint arXiv:1704.02665.
  • Abasabadi, S., Nematzadeh, H., Motameni, H., and Akbari, E. 2021. Automatic ensemble feature selection using fast non-dominated sorting. Information Systems, 100, 101760.
  • Yin, Z., Yang, X., Wang, P., Yu, H., and Qian, Y. 2023. Ensemble selector mixed with pareto optimality to feature reduction. Applied Soft Computing, 148, 110877.
  • McHugh, M. L. 2013. The chi-square test of independence. Biochemia medica, 23(2), 143-149.
  • Liu, J., Li, D., Shan, W., and Liu, S. 2024. A feature selection method based on multiple feature subsets extraction and result fusion for improving classification performance. Applied Soft Computing, 150, 111018.
  • Deng, Z., Zhu, X., Cheng, D., Zong, M., and Zhang, S. 2016. Efficient kNN classification algorithm for big data. Neurocomputing, 195, 143-148.
  • Saxena, A., and Wang, J. 2012. Dimensionality reduction with unsupervised feature selection and applying non-Euclidean norms for classification accuracy. In Exploring Advances in Interdisciplinary Data Mining and Analytics: New Trends. 91-109.
  • Wang, Q. 2022. Support vector machine algorithm in machine learning. In 2022 IEEE international conference on artificial intelligence and computer applications (ICAICA). 750-756.
Index Terms
Computer Science
Information Sciences
No index terms available.
Keywords

Feature selection Earthworm Optimization Algorithm Principal Component Analysis Dimensionality reduction

Powered by PhDFocusTM