|
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
|
| Volume 187 - Issue 90 |
| Published: March 2026 |
| Authors: Harun Hadzagic, Zerina Altoka |
10.5120/ijca2026926575
|
Harun Hadzagic, Zerina Altoka . Synthetic Data Generation For Automated JavaScript Vulnerability Detection Using Fine-Tuned CodeBERT. International Journal of Computer Applications. 187, 90 (March 2026), 16-22. DOI=10.5120/ijca2026926575
@article{ 10.5120/ijca2026926575,
author = { Harun Hadzagic,Zerina Altoka },
title = { Synthetic Data Generation For Automated JavaScript Vulnerability Detection Using Fine-Tuned CodeBERT },
journal = { International Journal of Computer Applications },
year = { 2026 },
volume = { 187 },
number = { 90 },
pages = { 16-22 },
doi = { 10.5120/ijca2026926575 },
publisher = { Foundation of Computer Science (FCS), NY, USA }
}
%0 Journal Article
%D 2026
%A Harun Hadzagic
%A Zerina Altoka
%T Synthetic Data Generation For Automated JavaScript Vulnerability Detection Using Fine-Tuned CodeBERT%T
%J International Journal of Computer Applications
%V 187
%N 90
%P 16-22
%R 10.5120/ijca2026926575
%I Foundation of Computer Science (FCS), NY, USA
The dynamic and flexible nature of JavaScript, the foundational language of modern web development, makes it highly susceptible to vulnerabilities such as Cross-Site Scripting (XSS), SQL Injection, and Hardcoded Secrets. Traditional security analysis tools, as well as manual code review, struggle to maintain accuracy and scalability in complex codebases, especially with the increasing use of AI in code production. To address this, this paper presents a high-performance solution utilizing a CodeBERT transformer model fine-tuned for automated binary sequence classification. A balanced dataset constructed of 71 vulnerabilities with 60 JavaScript code snippets (30 pairs of secure and insecure versions) generated through advanced LLMs. Employing a rigorous Pair-ID splitting methodology, it ensured the model was evaluated on truly unseen vulnerability patterns, preventing data leakage and overfitting. The fine-tuned CodeBERT model achieved exceptional performance on the held-out test set, culminating in an F1-Score of 0.9413. Crucially, the model attained a Recall of 0.9468 for the 'Insecure' class, confirming its ability to minimize missed vulnerabilities, the most critical error in security screening. Furthermore, a generalization check using an alternating dataset validated the model's robustness, maintaining a high F1-Score. The findings demonstrate the viability of specialized Code LLMs for reliable vulnerability detection, paving the way for low-latency integration into continuous integration pipelines to enforce secure coding practices in real time.