Research Article

Real-Time Human Action Recognition in Video Surveillance Using Machine Learning

by  Md.Musabbir Hossain, Md. Tachbir Dewan, Md. Sadikuzzaman, Abu Bakar M. Abdullah
journal cover
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 187 - Issue 73
Published: January 2026
Authors: Md.Musabbir Hossain, Md. Tachbir Dewan, Md. Sadikuzzaman, Abu Bakar M. Abdullah
10.5120/ijca2026926235
PDF

Md.Musabbir Hossain, Md. Tachbir Dewan, Md. Sadikuzzaman, Abu Bakar M. Abdullah . Real-Time Human Action Recognition in Video Surveillance Using Machine Learning. International Journal of Computer Applications. 187, 73 (January 2026), 48-53. DOI=10.5120/ijca2026926235

                        @article{ 10.5120/ijca2026926235,
                        author  = { Md.Musabbir Hossain,Md. Tachbir Dewan,Md. Sadikuzzaman,Abu Bakar M. Abdullah },
                        title   = { Real-Time Human Action Recognition in Video Surveillance Using Machine Learning },
                        journal = { International Journal of Computer Applications },
                        year    = { 2026 },
                        volume  = { 187 },
                        number  = { 73 },
                        pages   = { 48-53 },
                        doi     = { 10.5120/ijca2026926235 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }
                        %0 Journal Article
                        %D 2026
                        %A Md.Musabbir Hossain
                        %A Md. Tachbir Dewan
                        %A Md. Sadikuzzaman
                        %A Abu Bakar M. Abdullah
                        %T Real-Time Human Action Recognition in Video Surveillance Using Machine Learning%T 
                        %J International Journal of Computer Applications
                        %V 187
                        %N 73
                        %P 48-53
                        %R 10.5120/ijca2026926235
                        %I Foundation of Computer Science (FCS), NY, USA
Abstract

This paper presents an innovative framework for real-time human action recognition in video surveillance systems, aimed at delivering immediate detection of suspicious behavior, normal movements, and actionable insights for security operators. The proposed method integrates computer vision and machine learning techniques to improve recognition accuracy and system reliability. Motion analysis is performed using optical flow, where Optical Flow Energy Images (OFEI) are generated to extract motion-related features. A Convolutional Neural Network (CNN) is utilized to obtain high-dimensional feature representations while reducing dimensionality, and a Support Vector Machine (SVM) classifier is trained on these features for robust action classification. The system effectively detects and distinguishes human actions such as walking, looking around, looking up, smashing, and suspicious activities, even under challenging conditions including camera motion, zoom-in, and zoom-out. Experimental evaluations conducted on publicly available human action datasets demonstrate significant improvements in recognition accuracy. Additionally, the system overlays detected actions onto video streams, providing clear and actionable visual feedback to surveillance personnel. Successfully deployed in intelligent video surveillance environments, the proposed framework proves to be scalable, accurate, and effective for identifying abnormal behaviors and generating timely alerts in modern security applications.

References
  • Wang, H., Klaser, A., Schmid, C., & Liu, C. (2013). Action Recognition by Dense Trajectories in Video Surveillance. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(7), 1427–1436.
  • Karpathy, A., Toderici, G., Shetty, S., et al. (2014). Large-Scale Video Classification with Convolutional Neural Networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014, 1725–1732.
  • Simonyan, K., & Zisserman, A. (2014). Two-Stream Convolutional Networks for Action Recognition in Videos. Advances in Neural Information Processing Systems (NeurIPS), 27, 568–576.
  • Tran, D., Bourdev, L., Fergus, R., et al. (2015). Learning Spatiotemporal Features with 3D Convolutional Networks for Human Activity Detection. IEEE International Conference on Computer Vision (ICCV), 2015, 4489–4497.
  • Ji, S., Xu, W., Yang, M., & Yu, K. (2013). 3D Convolutional Neural Networks for Human Action Recognition in Surveillance Videos. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(1), 221–231.
  • Qiu, Z., Yao, T., & Mei, T. (2017). Learning Spatio-Temporal Features with Multi-Fiber Networks for Action Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 3853–3861.
  • Zhang, Z., Lan, C., Xing, J., et al. (2019). PoseFlow: A Deep Motion Representation for Action Recognition from Pose Sequences in Video Surveillance. Proceedings of the IEEE International Conference on Computer Vision (ICCV), 6762–6771.
  • Diba, A., Fayyaz, M., Sharma, V., Karami, A., Arzani, M. M., Yousefzadeh, R., & Van Gool, L. (2019). Temporal 3D ConvNets: New Architecture and Transfer Learning for Video Classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(6), 142–157.
  • Hara, K., Kataoka, H., & Satoh, Y. (2018). Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet? Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 6546–6555.
  • Ji, S., Xu, W., Yang, M., & Yu, K. (2013). 3D Convolutional Neural Networks for Human Action Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(1), 221–231.
  • Donahue, J., Anne Hendricks, L., Guadarrama, S., Rohrbach, M., Venugopalan, S., Saenko, K., & Darrell, T. (2015). Long-Term Recurrent Convolutional Networks for Visual Recognition and Description. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2625–2634.
  • Ng, J. Y., Hausknecht, M., Vijayanarasimhan, S., Vinyals, O., Monga, R., & Toderici, G. (2015). Beyond Short Snippets: Deep Networks for Video Classification. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 4694–4702.
  • Soomro, K., Zamir, A. R., & Shah, M. (2012). UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild. arXiv preprint arXiv:1212.0402.
  • Kay, W., Carreira, J., Simonyan, K., Zhang, B., Hillier, C., Vijayanarasimhan, S., ... & Zisserman, A. (2017). The Kinetics Human Action Video Dataset. arXiv preprint arXiv:1705.06950.
  • He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep Residual Learning for Image Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770–778.
  • Hochreiter, S., & Schmidhuber, J. (1997). Long Short-Term Memory. Neural Computation, 9(8), 1735–1780.
  • Szegedy, C., Ioffe, S., Vanhoucke, V., & Alemi, A. (2017). Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 31(1).
  • Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. (2014). Large-Scale Video Classification with Convolutional Neural Networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1725–1732.
  • Schüldt, C., Laptev, I., & Caputo, B. (2004). Recognizing Human Actions: A Local SVM Approach. Proceedings of the International Conference on Pattern Recognition (ICPR), 32–36.
  • Laptev, I., Marszałek, M., Schmid, C., & Rozenfeld, B. (2008). Learning Realistic Human Actions from Movies. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
  • Alturki, A.S. and Ibrahim, A.H. (2020). Real Time Action Recognition in Surveillance Video Using Machine Learning. International Journal of Engineering Research and Technology, 13(8), pp. 1874–1879.
  • B. K. Horn and B. G. Schunck, Determining Optical Flow, Artificial Intelligence, vol. 17, no. 1–3, pp. 185–203, 1981.
  • A. Krizhevsky, I. Sutskever, and G. Hinton, ImageNet Classification with Deep Convolutional Neural Networks, Communications of the ACM, vol. 60, no. 6, pp. 84–90, 2017.
Index Terms
Computer Science
Information Sciences
No index terms available.
Keywords

Optical flow machine learning classifier deep learning feature extraction action recognition video surveillance

Powered by PhDFocusTM