Detecting Algorithmically Generated Domains Using Entropy and Lexical Features

Jinsu Ann Mathew; Ninan Sajeeth Philip; Joe Jacob

Research Article

Detecting Algorithmically Generated Domains Using Entropy and Lexical Features

by Jinsu Ann Mathew, Ninan Sajeeth Philip, Joe Jacob

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 187 - Issue 44

Published: September 2025

Authors: Jinsu Ann Mathew, Ninan Sajeeth Philip, Joe Jacob

10.5120/ijca2025925758

PDF

Jinsu Ann Mathew, Ninan Sajeeth Philip, Joe Jacob . Detecting Algorithmically Generated Domains Using Entropy and Lexical Features. International Journal of Computer Applications. 187, 44 (September 2025), 37-44. DOI=10.5120/ijca2025925758

                        @article{ 10.5120/ijca2025925758,
                        author  = { Jinsu Ann Mathew,Ninan Sajeeth Philip,Joe Jacob },
                        title   = { Detecting Algorithmically Generated Domains Using Entropy and Lexical Features },
                        journal = { International Journal of Computer Applications },
                        year    = { 2025 },
                        volume  = { 187 },
                        number  = { 44 },
                        pages   = { 37-44 },
                        doi     = { 10.5120/ijca2025925758 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }

                        %0 Journal Article
                        %D 2025
                        %A Jinsu Ann Mathew
                        %A Ninan Sajeeth Philip
                        %A Joe Jacob
                        %T Detecting Algorithmically Generated Domains Using Entropy and Lexical Features%T 
                        %J International Journal of Computer Applications
                        %V 187
                        %N 44
                        %P 37-44
                        %R 10.5120/ijca2025925758
                        %I Foundation of Computer Science (FCS), NY, USA

Abstract

Detecting domain names generated by Domain Generation Algorithms (DGAs) is a key challenge in cybersecurity, as these domains are designed to appear unpredictable and evade standard filtering methods. This work proposes a lightweight and interpretable detection method that relies on lexical properties and entropy-based features derived from domain names. By analyzing character patterns and measuring randomness through Shannon entropy and relative entropy across bigrams, trigrams, and fourgrams, the method captures both structural and statistical differences between legitimate and algorithmic domains. Multiple machine learning classifiers were trained and evaluated, with the best results achieved using XGBoost and Random Forest. Entropy-based features were found to be highly influential in the classification process, highlighting their effectiveness in distinguishing algorithmically generated domains. The findings support the use of entropy as a practical and theoretically grounded feature for DGA detection.

References

Chen S, Lang B, Chen Y, Xie C. Detection of Algorithmically Generated Malicious Domain Names with Feature Fusion of Meaningful Word Segmentation and N-Gram Sequences. Appl Sci. 2023 Mar 30;13(7):4406.
Zhang Y. A Ensemble Learning method for Domain Generation Algorithm Detection. 3(4).
Ren F, Jiang Z, Wang X, Liu J. A DGA domain names detection modeling method based on integrating an attention mechanism and deep neural network. Cybersecurity. 2020 Dec;3(1):4.
Satoh A, Fukuda Y, Kitagata G, Nakamura Y. A Word-Level Analytical Approach for Identifying Malicious Domain Names Caused by Dictionary-Based DGA Malware. Electronics. 2021 Apr 28;10(9):1039.
Zhang Y, Zhang Y, Xiao J. Detecting the DGA-Based Malicious Domain Names. In: Yuan Y, Wu X, Lu Y, editors. Trustworthy Computing and Services [Internet]. Berlin, Heidelberg: Springer Berlin Heidelberg; 2014 [cited 2025 Aug 8]. p. 130–7. (Communications in Computer and Information Science; vol. 426). Available from: https://link.springer.com/10.1007/978-3-662-43908-1_17
Huynh KH, Visser M. Detecting Botnets Communicating with Command and Control Servers with DNS and NetFlow Data.
Wang T, Chen LC, Genc Y. A dictionary-based method for detecting machine-generated domains. Inf Secur J Glob Perspect. 2021 July 4;30(4):205–18.
Yadav S, Reddy AKK, Reddy ALN, Ranjan S. Detecting Algorithmically Generated Domain-Flux Attacks With DNS Traffic Analysis. IEEEACM Trans Netw. 2012 Oct;20(5):1663–77.
Zhang P, Liu T, Zhang Y, Ya J, Shi J, Wang Y. Domain Watcher: Detecting Malicious Domains Based on Local and Global Textual Features. Procedia Comput Sci. 2017;108:2408–12.
Zhang W wei, Gong J, Liu Q. Detecting Machine Generated Domain Names Based on Morpheme Features: In Shanghai, China; 2013 [cited 2025 Aug 8]. Available from: https://www.atlantis-press.com/article/9952
Liang Z, Zang T, Zeng Y. MalPortrait: Sketch Malicious Domain Portraits Based on Passive DNS Data. In: 2020 IEEE Wireless Communications and Networking Conference (WCNC) [Internet]. Seoul, Korea (South): IEEE; 2020 [cited 2025 Aug 8]. p. 1–8. Available from: https://ieeexplore.ieee.org/document/9120488/
Selvi J, Rodríguez RJ, Soria-Olivas E. Detection of algorithmically generated malicious domain names using masked N-grams. Expert Syst Appl. 2019 June;124:156–63.
Yadav S, Reddy AKK, Reddy ALN, Ranjan S. Detecting algorithmically generated malicious domain names. In: Proceedings of the 10th ACM SIGCOMM conference on Internet measurement [Internet]. Melbourne Australia: ACM; 2010 [cited 2025 Aug 8]. p. 48–61. Available from: https://dl.acm.org/doi/10.1145/1879141.1879148
Casino F, Lykousas N, Homoliak I, Patsakis C, Hernandez-Castro J. Intercepting Hail Hydra: Real-time detection of Algorithmically Generated Domains. J Netw Comput Appl. 2021 Sept;190:103135.
Cucchiarelli A, Morbidoni C, Spalazzi L, Baldi M. Algorithmically generated malicious domain names detection based on n-grams features. Expert Syst Appl. 2021 May;170:114551.
Palaniappan G, S S, Rajendran B, Sanjay, Goyal S, B S B. Malicious Domain Detection Using Machine Learning On Domain Name Features, Host-Based Features and Web-Based Features. Procedia Comput Sci. 2020;171:654–61.
Anderson HS, Woodbridge J, Filar B. DeepDGA: Adversarially-Tuned Domain Generation and Detection. In: Proceedings of the 2016 ACM Workshop on Artificial Intelligence and Security [Internet]. Vienna Austria: ACM; 2016 [cited 2025 Aug 8]. p. 13–21. Available from: https://dl.acm.org/doi/10.1145/2996758.2996767
G. P. A, R. G, S. K, Gladston A. A machine learning framework for domain generating algorithm based malware detection. Secur Priv. 2020 Nov;3(6):e127.
Ma J, Saul LK, Savage S, Voelker GM. Beyond blacklists: learning to detect malicious web sites from suspicious URLs. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining [Internet]. Paris France: ACM; 2009 [cited 2025 Aug 8]. p. 1245–54. Available from: https://dl.acm.org/doi/10.1145/1557019.1557153
Sivaguru R, Choudhary C, Yu B, Tymchenko V, Nascimento A, Cock MD. An Evaluation of DGA Classifiers. In: 2018 IEEE International Conference on Big Data (Big Data) [Internet]. Seattle, WA, USA: IEEE; 2018 [cited 2025 Aug 8]. p. 5058–67. Available from: https://ieeexplore.ieee.org/document/8621875/
Sivaguru R, Peck J, Olumofin F, Nascimento A, De Cock M. Inline Detection of DGA Domains Using Side Information. IEEE Access. 2020;8:141910–22.
Tong V, Nguyen G. A method for detecting DGA botnet based on semantic and cluster analysis. In: Proceedings of the Seventh Symposium on Information and Communication Technology [Internet]. Ho Chi Minh City Vietnam: ACM; 2016 [cited 2025 Aug 8]. p. 272–7. Available from: https://dl.acm.org/doi/10.1145/3011077.3011112
Almashhadani AO, Kaiiali M, Carlin D, Sezer S. MaldomDetector: A system for detecting algorithmically generated domain names with machine learning. Comput Secur. 2020 June;93:101787.
Hwang C, Kim H, Lee H, Lee T. Effective DGA-Domain Detection and Classification with TextCNN and Additional Features. Electronics. 2020 June 30;9(7):1070.
Alexa Top 1 Million Sites [Internet]. [cited 2025 Aug 19]. Available from: https://www.kaggle.com/datasets/cheedcheed/top1m

Index Terms

Computer Science

Information Sciences

No index terms available.

Keywords

Domain Generation Algorithm (DGA) Entropy-based features Lexical features N-gram analysis