Pedestrian Detection in Thermal Images Using Deep Saliency Map and Instance Segmentation

Full Text (PDF, 407KB), PP.40-49

Views: 0 Downloads: 0

Author(s)

A. K. M. Fahim Rahman 1,* Mostofa Rakib Raihan 1 S.M. Mohidul Islam 1

1. Computer Science and Engineering Discipline, Khulna University, Khulna, Bangladesh

* Corresponding author.

DOI: https://doi.org/10.5815/ijigsp.2021.01.04

Received: 19 Feb. 2020 / Revised: 23 Apr. 2020 / Accepted: 2 Jul. 2020 / Published: 8 Feb. 2021

Index Terms

Thermal image, saliency map, deep saliency network, instance segmentation, mask-RCNN

Abstract

Pedestrian detection is an established instance of computer vision task. Pedestrian detection from the color images has achieved robust performance but in the night time or in bad light conditions it has low detection accuracy. Thermal images are used for detecting people at night time, foggy weather or in bad lighting situations when color images have a lower vision. But in the daytime where the surroundings are warm or warmer than pedestrians then the thermal image has lower accuracy. Hence thermal and color image pair can be a solution but it is expensive to capture color-thermal pair and misaligned imagery can cause low detection accuracy. We proposed a network that achieved better accuracy by extending the prior works which introduced the use of the saliency map in pedestrian detection tasks from the thermal images into instance-level segmentation. We worked on a subdivision of KAIST Multispectral Pedestrian Detection Dataset [8] which has pixel-level annotations. We have trained Mask-RCNN for pedestrian detection task and report the added effect of saliency maps generated using PiCA-Net. We have achieved an accuracy of 88.14% over day and 91.84% over night images. . So, our model has reduced the miss rate by 24.1% and 23% over the existing state-of-the-art method in day and night images.

Cite This Paper

A. K. M. Fahim Rahman, Mostofa Rakib Raihan, S.M. Mohidul Islam, " Pedestrian Detection in Thermal Images Using Deep Saliency Map and Instance Segmentation", International Journal of Image, Graphics and Signal Processing(IJIGSP), Vol.13, No.1, pp. 40-49, 2021. DOI:10.5815/ijigsp.2021.01.04

Reference

[1]Geiger A, Lenz P and Urtasun R. “Are we ready for autonomous driving? the kitti vision benchmark suite.” In 2012 IEEE Conference on Computer Vision and Pattern Recognition 2012 Jun 16 (pp. 3354-3361). IEEE. doi: 10.1109/CVPR.2012.6248074
[2]Li C, Song D, Tong R and Tang M. “Illumination-aware faster R-CNN for robust multispectral pedestrian detection.” Pattern Recognition. 2019 Jan 1;85:161-71. doi:10.1016/j.patcog.2018.08.005
[3]Liu J, Zhang S, Wang S and Metaxas DN. “Multispectral deep neural networks for pedestrian detection.” arXiv preprint arXiv:1611.02644. 2016 Nov 8.
[4]Klein DA and Frintrop S. “Center-surround divergence of feature statistics for salient object detection.” In2011 International Conference on Computer Vision 2011 Nov 6 (pp. 2214-2219). IEEE. doi: 10.1109/ICCV.2011.6126499
[5]Hwang S, Park J, Kim N, Choi Y and So Kweon I. Multispectral pedestrian detection: Benchmark dataset and baseline. In Proceedings of the IEEE conference on computer vision and pattern recognition 2015 (pp. 1037-1045).
[6]Liu N, Han J and Yang MH. “Picanet: Learning pixel-wise contextual attention for saliency detection.” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2018 (pp. 3089-3098).
[7]He K, Gkioxari G, Dollár P and Girshick R. “Mask R-CNN.” In Proceedings of the IEEE international conference on computer vision 2017 (pp. 2961-2969). arXiv:1703.06870
[8]Ghose D et al. “Pedestrian Detection in Thermal Images using Saliency Maps.” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops 2019 (pp. 0-0).
[9]Koch C and Ullman S. “Shifts in selective visual attention: towards the underlying neural circuitry.” In Matters of intelligence 1987 (pp. 115-141). Springer, Dordrecht.
[10]Y. Jaehoon. Pytorch implementation of Pica-Net: Learning pixel-wise contextual attention for saliency detection. URL https://github.com/Ugness/PiCANetImplementation, 2018.
[11]Girshick R. “Fast R-CNN.” In Proceedings of the IEEE international conference on computer vision 2015 (pp. 1440-1448).
[12]Dollar P, Wojek C, Schiele B and Perona P. “Pedestrian detection: An evaluation of the state of the art.” IEEE transactions on pattern analysis and machine intelligence. 2011 Aug 4;34(4):743-61.
[13]Yuxin Wu and Alexander Kirillov and Francisco Massa and Wan-Yen Lo and Ross Girshick. Detectron2: https://github.com/facebookresearch/detectron2, 2019.
[14]Ren S, He K, Girshick R and Sun J. “Faster R-CNN: Towards real-time object detection with region proposal networks.” In Advances in neural information processing systems 2015 (pp. 91-99).
[15]“Thermography” [Online]. Available: https://en.wikipedia.org/wiki/Thermography Collected 26 January 2020.
[16]Wang X, Wang M and Li W. “Scene-specific pedestrian detection for static video surveillance.” IEEE transactions on pattern analysis and machine intelligence. 2013 Jun 25;36(2):361-74. doi: 10.1109/TPAMI.2013.124
[17]Girshick R, Donahue J, Darrell T and Malik J. “Rich feature hierarchies for accurate object detection and semantic segmentation.” In Proceedings of the IEEE conference on computer vision and pattern recognition 2014 (pp. 580-587).
[18]Ren S, He K, Girshick R and Sun J. “Faster R-CNN: Towards real-time object detection with region proposal networks.” In Advances in neural information processing systems 2015 (pp. 91-99).
[19]Kim J. “Pedestrian Detection and Distance Estimation Using Thermal Camera in Night Time.” In2019 International Conference on Artificial Intelligence in Information and Communication (ICAIIC) 2019 Feb 11 (pp. 463-466). IEEE.
[20]Davis JW and Sharma V. “Background-subtraction in thermal imagery using contour saliency.” International Journal of Computer Vision. 2007 Feb 1;71(2):161-81.
[21]Itti L, Koch C and Niebur E. “A model of saliency-based visual attention for rapid scene analysis.” IEEE Transactions on pattern analysis and machine intelligence. 1998 Nov;20(11):1254-9. doi: 10.1109/34.730558
[22]Turkowski K. “Filters for common resampling tasks.” In Graphics gems 1990 Aug 1 (pp. 147-165). Academic Press Professional, Inc..
[23]Shapiro L, Stockman G. Computer Vision Prentice Hall. Inc., New Jersey. 2001.
[24]Lauren B and Lee LW. “Perceptual information processing system.” Paravue Inc. US Patent Application. 2003 Jul;10(618,543).
[25]Hariharan B, Arbeláez P, Girshick R and Malik J. “Simultaneous detection and segmentation.” In European Conference on Computer Vision 2014 Sep 6 (pp. 297-312). Springer, Cham.
[26]Tai JC and Song KT. “Background segmentation and its application to traffic monitoring using modified histogram.” In IEEE International Conference on Networking, Sensing and Control, 2004 2004 Mar 21 (Vol. 1, pp. 13-18). IEEE. doi: 10.1109/ICNSC.2004.1297401
[27]Göktürk K and Jönsson A. “Developing a Resource-Efficient Sensor Cleaning System for Autonomous Heavy Vehicles (2019).”
[28]Zhao D, Hanson EJ, Nix MC, Chin S, inventors; Uber Technologies Inc, assignee. Systems and Methods for On-Site Recovery of Autonomous Vehicles. United States patent application US 15/884,852. 2019 Aug 1.
[29]Li C, Song D, Tong R and Tang M. “Illumination-aware faster R-CNN for robust multispectral pedestrian detection.” Pattern Recognition. 2019 Jan 1;85:161-71.
[30]Zhang L et al. “Cross-modality interactive attention network for multispectral pedestrian detection.” Information Fusion. 2019 Oct 1;50:20-9.
[31]Lahmyed R, El Ansari M and Ellahyani A. “A new thermal infrared and visible spectrum images-based pedestrian detection system.” Multimedia Tools and Applications. 2019 Jun 30;78(12):15861-85.
[32]Bilal M and Hanif MS. “High performance real-time pedestrian detection using light weight features and fast cascaded kernel SVM classification.” Journal of Signal Processing Systems. 2019 Feb 1;91(2):117-29.
[33]Bastian BT and Jiji CV. “Pedestrian detection using first-and second-order aggregate channel features.” International Journal of Multimedia Information Retrieval. 2019 Jun 1;8(2):127-33.
[34]Kim S, Kwak S and Ko BC. “Fast pedestrian detection in surveillance video based on soft target training of shallow random forest.” IEEE Access. 2019 Jan 11;7:12415-26. doi: 10.1109/ACCESS.2019.2892425
[35]Zhang L et al. “Weakly aligned cross-modal learning for multispectral pedestrian detection.” In Proceedings of the IEEE International Conference on Computer Vision 2019 (pp. 5127-5137).
[36]Van de Weijer J, Gevers T and Bagdanov AD. “Boosting color saliency in image feature detection.” IEEE transactions on pattern analysis and machine intelligence. 2005 Nov 21;28(1):150-6. doi: 10.1109/TPAMI.2006.3
[37]Li Y, Qi H, Dai J, Ji X and Wei Y. “Fully convolutional instance-aware semantic segmentation.” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2017 (pp. 2359-2367).
[38]Goferman S, Zelnik-Manor L and Tal A. “Context-aware saliency detection.” IEEE transactions on pattern analysis and machine intelligence. 2011 Dec 27;34(10):1915-26. doi: 10.1109/TPAMI.2011.272
[39]Yan Q, Xu L, Shi J and Jia J. “Hierarchical saliency detection.” In Proceedings of the IEEE conference on computer vision and pattern recognition 2013 (pp. 1155-1162).
[40]Hou X and Zhang L. “Saliency detection: A spectral residual approach.” In2007 IEEE Conference on computer vision and pattern recognition 2007 Jun 17 (pp. 1-8). Ieee. doi: 10.1109/CVPR.2007.383267
[41]Harel J, Koch C and Perona P. “Graph-based visual saliency.” In Advances in neural information processing systems 2007 (pp. 545-552).
[42]Achanta R, Estrada F, Wils P and Süsstrunk S. “Salient region detection and segmentation.” In International conference on computer vision systems 2008 May 12 (pp. 66-75). Springer, Berlin, Heidelberg.
[43]Achanta R, Hemami S, Estrada F and Susstrunk S. “Frequency-tuned salient region detection.” In2009 IEEE conference on computer vision and pattern recognition 2009 Jun 20 (pp. 1597-1604). IEEE. doi: 10.1109/CVPR.2009.5206596
[44]Cheng MM, Mitra NJ, Huang X, Torr PH and Hu SM. “Global contrast based salient region detection.” IEEE Transactions on Pattern Analysis and Machine Intelligence. 2014 Aug 5;37(3):569-82. doi: 10.1109/TPAMI.2014.2345401
[45]Deng Z. et al. “R3net: Recurrent residual refinement network for saliency detection.” In Proceedings of the 27th International Joint Conference on Artificial Intelligence 2018 Jul 13 (pp. 684-690). AAAI Press.
[46]Szarvas, Mate, Akira Yoshizawa, Munetaka Yamamoto, and Jun Ogata. "Pedestrian detection with convolutional neural networks." In IEEE Proceedings. Intelligent Vehicles Symposium, 2005., pp. 224-229. IEEE, 2005.
[47]Setjo, Christian Herdianto, and Balza Achmad. "Thermal image human detection using Haar-cascade classifier." In 2017 7th International Annual Engineering Seminar (InAES), pp. 1-6. IEEE, 2017.
[48]Paul, Manoranjan, Shah ME Haque, and Subrata Chakraborty. "Human detection in surveillance videos and its applications-a review." EURASIP Journal on Advances in Signal Processing 2013, no. 1 (2013): 176.