Human Action Recognition Using Modified Bag of Visual Word based on Spectral Perception

Full Text (PDF, 981KB), PP.34-43

Views: 0 Downloads: 0

Author(s)

Om Mishra 1,* Rajiv Kapoor 1 M.M. Tripathi 2

1. Department of Electronics & Communication, Delhi Technological University, Delhi-110042, India

2. Department of Electrical Engineering, Delhi Technological University, Delhi-110042, India

* Corresponding author.

DOI: https://doi.org/10.5815/ijigsp.2019.09.04

Received: 2 May 2019 / Revised: 3 Jun. 2019 / Accepted: 25 Jun. 2019 / Published: 8 Sep. 2019

Index Terms

Bag-of-visual-word, Contextual distance, Directed graph, Laplacian of directed graph, SVM

Abstract

Human action recognition has a very vast application such as security, patient care, etc. Background cluttering, appearance change due to variation in viewpoint and occlusion are the prominent hurdles that can reduce the recognition rate significantly. Methodologies based on Bag-of-visual-words are very popular because they do not require accurate background subtraction. But the main disadvantage with these methods is that they do not retain the geometrical structural information of the clusters that they form. As a result, they show intra-class mismatching. Furthermore, these methods are very sensitive to noise. Addition of noise in the cluster also results in the misclassification of the action. To overcome these problems we proposed a new approach based on modified Bag-of-visual-word. Proposed methodology retains the geometrical structural information of the cluster based on the calculation of contextual distance among the points of the cluster. Normally contextual distance based on Euclidean measure cannot deal with the noise but in the proposed methodology contextual distance is calculated on the basis of a difference between the contributions of cluster points to maintain its geometrical structure. Later directed graphs of all clusters are formed and these directed graphs are described by the Laplacian. Then the feature vectors representing Laplacian are fed to the Radial Basis Function based Support Vector Machine (RBF-SVM) classifier.

Cite This Paper

Om Mishra, Rajiv Kapoor, M.M. Tripathi, "Human Action Recognition Using Modified Bag of Visual Word based on Spectral Perception", International Journal of Image, Graphics and Signal Processing(IJIGSP), Vol.11, No.9, pp. 34-43, 2019. DOI: 10.5815/ijigsp.2019.09.04

Reference

[1]J. Aggrawal, and M. Rayoo, “Human Activity Analysis: A Review,” Journal of ACM Computing Survey, vol. 43(3), pp. 16-43, 2011.

[2]J. Dou, and J. Li, "Robust human action recognition based on spatiotemporal descriptors and motion temporal templates," Optik, vol.125(7), pp. 1891-1896, 2014.

[3]D.K. Vishwakarma, R. Kapoor, and A. Dhiman, “A proposed unified framework for the recognition of human activity by exploiting the characteristics of action dynamics,” Journal of robotics and autonomous system, vol. 77, pp. 25-28, 2016.

[4]D. Zhao, L. Shao, X. Zhen, and Y. Liu, “Combining appearance and structural features for human action recognition,” Neurocomputing, vol. 113(3), pp. 88-96, 2013. 

[5]D.K. Vishwakarma, and R. Kapoor, “Hybrid classifier based human activity recognition using the silhouette and cells,” Expert Systems with Applications, vol. 42(20), pp. 6957-6965, 2015.

[6]C. Achar, X. Qu, A. Mokhber, and M. Milgram,  “A novel approach for action recognition of human actions with semi-global features,” Journal of machine vision and application, vol. 19 (1), pp.27-34, 2008.

[7]C. Lin, F. Hsu, and W. Lin, “Recognizing human actions using NWFE-based histogram vectors,” EURASIP Journal of Advances in Signal Processing, vol. 9, 2010.

[8]J. Gu, X. Ding, and S. Wang, “Action recognition from arbitrary views using 3D-key-pose set,”Frontiers of Electrical and Electronic Engineering, vol. 7(2), pp.224-241, 2012.

[9]L.Gorelick, M. Blank, E. Shechtman, M. Irani, and R. Basri, “Actions as Space-Time Shapes,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 29(12), pp.2247-2253, 2007.

[10]S. Sadek, A. Hamadi, M. Elmezain, B. Michaelis, and U. Sayed, “Human action recognition via affine moment invariants,” International Conference on Pattern Recognition, Tsukuba, pp. 218-221,2012.

[11]L. Presti, M. Cascia, S. Sclaroff, and O. Camps,  “Hanklet-based dynamical systems modeling for 3-D action recognition,” Journal Image and Vision Computing, vol. 44(C), pp.29-43, 2015.

[12]A. Yilmaz, and M. Shah, “A differential geometric approach to representing the human actions,” Computer Vision and Image Understanding (CVIU), pp .335-351 2008.

[13]K. Guo, P. Ishwar, and J. Konrad, “Action recognition from video using feature covariance matrices,” IEEE Transaction on Image Processing, vol. 22(6), pp. 2479-2494, 2013. 

[14]Q. Li, H. Cheng, Y. Zhou, and G. Huo, “Human Action Recognition Using Improved Salient Dense Trajectories,” Computational Intelligence and Neuroscience, vol.5,pp.1-11, 2016.

[15]D. Wu, and L. Shao, “Silhouette analysis-based action recognition via exploiting human poses,” IEEE transaction on circuits and systems for video technology, vol. 23(2), pp. 236-243, 2013.

[16]S. A. Rahman, I. Song, M.K.H. Leung, I. Lee, and K. Lee, “Fast action recognition using negative space features,” Expert System and Applications, vol.  41(2), pp.574-587, 2014.

[17]I.G. Conde, and D.N Olivieri, “A KPCA spatiotemporal differential geometric trajectory cloud classifier for recognizing human actions in a CBVR system,” Expert System and Applications, vol.  42(13), pp.5472-5490,2015.

[18]B. Li, O.I. Camps, and M. Sznaier, “Cross-view activity recognition using Hanklets,” International Conference on Computer Vision and Pattern Recognition, Providence, 2012.

[19]Y. Shi, Y. Tian, Y. Wang, and T. Huang, “Sequential deep trajectory descriptor for action recognition with three-stream CNN,” IEEE Transaction of Multimedia, vol. 19(7), pp.1510-1520, 2017.

[20]A. Fathi, and G. Mori, “Action recognition by learning mid-level motion features,” International Conference on Computer Vision and Pattern Recognition (CVPR), Anchorage, 2008.

[21]X.L. Min, H.J. Xia, and T.L. Zheng, “Human action recognition based on chaotic invariants,”Journal of South Central University, pp. 3171-3179, 2014.

[22]Guha, T. and Ward, R.K., “Learning sparse representations for human action recognition,” IEEE  Transaction of Pattern Analysis and Machine Intelligence, 34(8), pp. 1576-1588, 2012.

[23]X. Wu, D. Xu, L. Duan, and J. Luo, “Action recognition using con-text and appearance distribution features,” International IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, 2011.

[24]X. Wu and Y. Jia, “View-invariant action recognition using latent kernelized structural SVM,” Proceedings of the 12th European Conference on Computer Vision (ECCV), Florence, Italy, 2012.

[25]E.A. Mosabbeb, K. Raahemifar, and M. Fathy, “Multi-view human activity recognition in distributed camera,”  Sensors, vol. 13(7), pp. 8750-8770, 2013.

[26]J. Wang, H. Zheng, J. Gao, and J. Cen, “Cross-view action recognition based on a statistical translation framework,” IEEE Transaction of Circuits Systems for Video Technology, vol. 26(8), pp.1461-1475, 2016.

[27]Y. Wang, and G. Mori, “Human action recognition using semi-latent topic model,” IEEE Transactional of Pattern Analysis and Machine Intelligence, vol. 31(10), pp. 1762-1764, 2009.

[28]A. Iosifidis, A. Tefas, and I. Pitas, “Discriminant bag of words based representation for human action recognition,” Pattern Recognition Letters, vol. 49(C), pp. 185-192, 2014.

[29]D. Weinland, M. Özuysal, and P. Fua, “Making action recognition robust to occlusions and viewpoint changes,” European Conference on Computer Vision (ECCV), Crete, Greece, 2010.

[30]L. Wang, Y. Qiao and X. Tang, “Action recognition with trajectory pooled deep-convolutional descriptors,” IEEE Conference on Computer Vision and Pattern Recognition, pp. 4305-4314, 2015.

[31]L. Sun, K. Jia, D-Y. Yeung, and B.E. Shi, “Human action recognition using factorized spatiotemporal convolutional networks,” IEEE International Conference on Computer Vision (ICCV), pp. 4597-4605, 2015.

[32]Z. Zhang, W. Chunheng, X. Baihua, Z. Wen,  and L. Shuang, “Action Recognition Using Context-Constrained Linear Coding,” IEEE Signal Processing Letters, vol. 19(7), pp. 439-442, 2012.

[33]A. Kovashka, and K. Grauman, “Learning a hierarchy of discriminative space-time neighborhood features for human action recognition,” Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2046-2053, 2010. 

[34]I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfield, “Learning realistic human actions from movies,” Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp.1-8, Jun.2008.

[35]C. Schuldt, I. Laptev, and B. Caputo, “Recognizing human actions: a local SVM approach,” Proceedings of the 17th International Conference on Pattern Recognition, Cambridge, 2004.

[36]D. Weinland, R. Ronfard, and E. Boyer, “Free viewpoint action recognition using motion history volumes,” Computer Vision and Image Understanding, vol. 104(2-3), pp. 249-257, 2006.

[37]H. Liu, M. Liu, and Q. Sun, “Learning directional co-occurrence for human action classification,” IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 1235-1239, 2014.

[38]P. Dollar, V.  Rabaud, G. Cottrell, and S. Belongie, “Behavior recognition via sparse spatiotemporal features,” IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp. 65-72, 2005.

[39]M. Ryoo, “Human activity prediction: Early recognition of ongoing activities from streaming videos,” IEEE International Conference on Computer Vision (ICCV), pp. 1036-1043, 2011.

[40]Q. Sun,  and H. Liu, “Action disambiguation analysis using normalized Google-like distance correlogram,” Springer Asian Conference on Computer Vision (ACCV), pp. 425-437, 2012. 

[41]Q. Sun, and H. Liu, “Learning spatiotemporal co-occurrence correlograms for efficient human action classification,” IEEE International Conference on Image Processing (ICIP), pp. 3220-3224, 2013.

[42]J. Wang, J. Yang, K. Yu, F. Lv, T. Huang,  and Y. Gong, “Locality constrained linear coding for image classification,” Proc. CVPR, pp. 3360-3367, Jun. 2010.

[43]I. Laptev, “On space-time interest points,” International Journal of Computer Vision, vol. 64(2-3), pp. 107-123, Sep.2005.