International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
|
Volume 187 - Issue 39 |
Published: September 2025 |
Authors: Dipu Dahal, Shirshak Acharya, Yunij Karki, Shashank Ghimire, Dinesh Baniya Kshatri |
![]() |
Dipu Dahal, Shirshak Acharya, Yunij Karki, Shashank Ghimire, Dinesh Baniya Kshatri . Automatic Name-Based Software Bug Detection via AST-Driven Static Analysis and Machine Learning. International Journal of Computer Applications. 187, 39 (September 2025), 13-22. DOI=10.5120/ijca2025925682
@article{ 10.5120/ijca2025925682, author = { Dipu Dahal,Shirshak Acharya,Yunij Karki,Shashank Ghimire,Dinesh Baniya Kshatri }, title = { Automatic Name-Based Software Bug Detection via AST-Driven Static Analysis and Machine Learning }, journal = { International Journal of Computer Applications }, year = { 2025 }, volume = { 187 }, number = { 39 }, pages = { 13-22 }, doi = { 10.5120/ijca2025925682 }, publisher = { Foundation of Computer Science (FCS), NY, USA } }
%0 Journal Article %D 2025 %A Dipu Dahal %A Shirshak Acharya %A Yunij Karki %A Shashank Ghimire %A Dinesh Baniya Kshatri %T Automatic Name-Based Software Bug Detection via AST-Driven Static Analysis and Machine Learning%T %J International Journal of Computer Applications %V 187 %N 39 %P 13-22 %R 10.5120/ijca2025925682 %I Foundation of Computer Science (FCS), NY, USA
This paper presents a name analysis technique for statically typed languages to automatically classify and localize specific bugs in source code, eliminating the need for manually designed algorithms or heuristics. Name-based bug detection involves analyzing source code to detect potential bugs based on the names or labels used for variables, functions and other elements in the code. The Abstract Syntax Tree (AST) of the source code is utilized to automatically generate negative (buggy) samples due to the unavailability of a large set of negative samples. Approximately 720,000 code snippets of C language are collected from a large C code corpus and parsed into their corresponding ASTs using LibClang. Positive samples are extracted from AST and their contents are adjusted to generate negative samples. These samples are tokenized using a fine-tuned tokenizer and fed into a classification model for training to identify potential bugs. This paper describes techniques for detecting bugs related to swapped function arguments, wrong binary operators and wrong operator precedence, with a high F1 score between 83% and 95%. Moreover, the detection of new types of bugs can be easily accomplished by following similar steps taken in developing current bug detectors. The resulting system can automatically detect specific types of bugs in source code, serving as a tool that enhances code quality for software developers.