|
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
|
| Volume 187 - Issue 72 |
| Published: January 2026 |
| Authors: Aditya P. Bakshi |
10.5120/ijca2026926209
|
Aditya P. Bakshi . A Comprehensive Review of Object Detection: From Handicraft Features to Deep Convolutional and Transformer-Based Architectures. International Journal of Computer Applications. 187, 72 (January 2026), 41-49. DOI=10.5120/ijca2026926209
@article{ 10.5120/ijca2026926209,
author = { Aditya P. Bakshi },
title = { A Comprehensive Review of Object Detection: From Handicraft Features to Deep Convolutional and Transformer-Based Architectures },
journal = { International Journal of Computer Applications },
year = { 2026 },
volume = { 187 },
number = { 72 },
pages = { 41-49 },
doi = { 10.5120/ijca2026926209 },
publisher = { Foundation of Computer Science (FCS), NY, USA }
}
%0 Journal Article
%D 2026
%A Aditya P. Bakshi
%T A Comprehensive Review of Object Detection: From Handicraft Features to Deep Convolutional and Transformer-Based Architectures%T
%J International Journal of Computer Applications
%V 187
%N 72
%P 41-49
%R 10.5120/ijca2026926209
%I Foundation of Computer Science (FCS), NY, USA
Object detection has experienced a substantial evolution over the past two decades, transitioning from handcrafted feature-based pipelines to highly expressive deep learning and transformer-driven architectures. Early detection systems relied on manually designed descriptors such as Histograms of Oriented Gradients (HOG) and Deformable Part Models (DPM), coupled with exhaustive sliding-window or part-based search strategies. While effective in constrained scenarios, these approaches were limited by weak semantic representation, sensitivity to scale and illumination variations, and poor generalization to complex real-world environments. The advent of deep convolutional neural networks (CNNs) fundamentally reshaped object detection by enabling end-to-end hierarchical feature learning from large-scale annotated datasets. This shift led to the development of region-proposal-based two-stage detectors, single-stage dense regression models, and, more recently, transformer-based architectures that reformulate detection as a global set prediction problem. This paper presents a comprehensive and in-depth review of modern object detection frameworks, systematically covering two-stage detectors, one-stage detectors, and transformer-driven models. The review emphasizes the theoretical foundations underlying these paradigms, including multi-scale feature learning, anchor-based and anchor-free localization strategies, attention mechanisms, loss function design, and hierarchical feature aggregation. Key innovations such as Feature Pyramid Networks, focal loss, deformable convolutions, and encoder–decoder transformers are critically analyzed to understand their impact on detection accuracy, convergence behavior, robustness, and computational efficiency. In addition, the survey examines benchmark datasets, evaluation protocols, training strategies, and deployment challenges, highlighting persistent issues such as small-object detection, long-tail class distributions, data efficiency, and inference latency. Finally, emerging research directions are discussed, including lightweight and efficient transformer architectures, multimodal and open-vocabulary object detection, self-supervised and semi-supervised pretraining, and unified perception models that integrate detection with segmentation and tracking. By synthesizing both theoretical insights and empirical trends, this review aims to provide a cohesive foundation for advancing robust, efficient, and scalable object detection systems.