FeatureGAN: Combining GAN and Autoencoder for Pavement Crack Image Data Augmentations

Full Text (PDF, 1134KB), PP.28-43

Views: 0 Downloads: 0

Author(s)

Xinkai Zhang 1,* Bo Peng 1 Zaid Al-Huda 1 Donghai Zhai 1

1. School of computing and artificial intelligence, Southwest Jiaotong University, Chengdu, Sichuan, China

* Corresponding author.

DOI: https://doi.org/10.5815/ijigsp.2022.05.03

Received: 15 May 2022 / Revised: 14 Jun. 2022 / Accepted: 27 Jul. 2022 / Published: 8 Oct. 2022

Index Terms

Pavement crack segmentation, auto-encoder, GAN, data augmentation, data annotation.

Abstract

In the pavement crack segmentation task, the accurate pixel-level labeling required in the fully supervised training of deep neural networks (DNN) is challenging. Although cracks often exhibit low-level image characters in terms of edges, there might be various high-level background information based on the complex pavement conditions. In practice, crack samples containing various semantic backgrounds are scarce. To overcome these problems, we propose a novel method for augmenting the training data for DNN based crack segmentation task. It employs the generative adversarial network (GAN), which utilizes a crack-free image, a crack image, and a corresponding image mask to generate a new crack image. In combination with an auto-encoder, the proposed GAN can be used to train crack segmentation networks. By creating a manual mask, no additional crack images are required to be labeled, and data augmentation and annotation are achieved simultaneously. Our experiments are conducted on two public datasets using five segmentation models of different sizes to verify the effectiveness of the proposed method. Experimental results demonstrate that the proposed method is effective for crack segmentation.

Cite This Paper

Xinkai Zhang, Bo Peng, Zaid Al-Huda, Donghai Zhai, " FeatureGAN: Combining GAN and Autoencoder for Pavement Crack Image Data Augmentations", International Journal of Image, Graphics and Signal Processing(IJIGSP), Vol.14, No.5, pp. 28-43, 2022. DOI:10.5815/ijigsp.2022.05.03

Reference

[1]T. Cao, J. Hu, and S. Liu, “Enhanced edge detection for 3d crack segmentation and depth measurement with laser data,” International Journal of Pattern Recognition and Artificial Intelligence, p. 2255006, 2022.

[2]S. A. H. Tabatabaei, A. Delforouzi, M. H. Khan, T. Wesener, and M. Grzegorzek, “Automatic detection of the cracks on the concrete railway sleepers,” International Journal of Pattern Recognition and Artificial Intelligence, vol. 33, no. 09, p. 1955010, 2019.

[3]D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” arXiv preprint arXiv:1312.6114, 2013.

[4]A. Radford, L. Metz, and S. Chintala, “Unsupervised representation learning with deep convolutional generative adversarial networks,” arXiv preprint arXiv:1511.06434, 2015.

[5]M. Mirza and S. Osindero, “Conditional generative adversarial nets,” arXiv preprint arXiv:1411.1784, 2014.

[6]A. Odena, C. Olah, and J. Shlens, “Conditional image synthesis with auxiliary classifier gans,” in International conference on machine learning. 1em plus 0.5em minus 0.4em PMLR, 2017, pp. 2642–2651.

[7]G. Mariani, F. Scheidegger, R. Istrate, C. Bekas, and C. Malossi, “Bagan: Data augmentation with balancing gan,” arXiv preprint arXiv:1803.09655, 2018.

[8]Y. Wang, Q. Yao, J. T. Kwok, and L. M. Ni, “Generalizing from a few examples: A survey on few-shot learning,” ACM computing surveys (csur), vol. 53, no. 3, pp. 1–34, 2020.

[9]Biserka Petrovska, Igor Stojanovic, Tatjana Atanasova-Pacemska,"Classification of Small Sets of Images with Pre-trained Neural Networks", International Journal of Engineering and Manufacturing, Vol.8, No.4, pp.40-55, 2018.

[10]J. Bennilo Fernandes, Kasiprasad Mannepalli, " Enhanced Deep Hierarchal GRU & BILSTM using Data Augmentation and Spatial Features for Tamil Emotional Speech Recognition", International Journal of Modern Education and Computer Science, Vol.14, No.3, pp. 45-63, 2022.

[11]A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Advances in neural information processing systems, vol. 25, 2012.

[12]I. Sato, H. Nishimura, and K. Yokoi, “Apac: Augmented pattern classification with neural networks,” arXiv preprint arXiv:1505.03229, 2015. 

[13]H. Zhang, M. Cisse, Y. N. Dauphin, and D. Lopez-Paz, “mixup: Beyond empirical risk minimization,” arXiv preprint arXiv:1710.09412, 2017.

[14]T. DeVries and G. W. Taylor, “Improved regularization of convolutional neural networks with cutout,” arXiv preprint arXiv:1708.04552, 2017.

[15]S. Yun, D. Han, S. J. Oh, S. Chun, J. Choe, and Y. Yoo, “Cutmix: Regularization strategy to train strong classifiers with localizable features,” in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 6023–6032.

[16]T. Remez, J. Huang, and M. Brown, “Learning to segment via cut-and-paste,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 37–52.

[17]G. Li, J. Wan, S. He, Q. Liu, and B. Ma, “Semi-supervised semantic segmentation using adversarial learning for pavement crack detection,” IEEE Access, vol. 8, pp. 51 446–51 459, 2020.

[18]I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” Advances in neural information processing systems, vol. 27, 2014.

[19]N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “Smote: synthetic minority over-sampling technique,” Journal of artificial intelligence research, vol. 16, pp. 321–357, 2002. 

[20]I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville, “Improved training of wasserstein gans,” Advances in neural information processing systems, vol. 30, 2017.

[21]M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein generative adversarial networks,” in International conference on machine learning. 1em plus 0.5em minus 0.4em PMLR, 2017, pp. 214–223.

[22]A. Shrivastava, T. Pfister, O. Tuzel, J. Susskind, W. Wang, and R. Webb, “Learning from simulated and unsupervised images through adversarial training,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2107–2116. 

[23]Thuy Nguyen Thi Thu, Lich Nghiem Thi, Nguyen Thu Thuy, Toan Nghiem Thi, Nguyen Chi Trung, "Improving Classification by Using MASI Algorithm for Resampling Imbalanced Dataset", International Journal of Intelligent Systems and Applications, Vol.11, No.10, pp.33-41, 2019.

[24]F. H. K. d. S. Tanaka and C. Aranha, “Data augmentation using gans,” arXiv preprint arXiv:1904.09135, 2019.

[25]Y. Wang, G. Huang, S. Song, X. Pan, Y. Xia, and C. Wu, “Regularizing deep networks with semantic data augmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021.

[26]O. Chapelle, J. Weston, L. Bottou, and V. Vapnik, “Vicinal risk minimization,” Advances in neural information processing systems, vol. 13, 2000.

[27]I. Goodfellow, Y. Bengio, A. Courville, and Y. Bengio, “Deep learning (vol. 1) cambridge,” 2016. 

[28]Y. Chen, Y.-K. Lai, and Y.-J. Liu, “Cartoongan: Generative adversarial networks for photo cartoonization,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 9465–9474.

[29]H. Abdi and L. J. Williams, “Principal component analysis,” Wiley interdisciplinary reviews: computational statistics, vol. 2, no. 4, pp. 433–459, 2010.

[30]F. Yang, L. Zhang, S. Yu, D. Prokhorov, X. Mei, and H. Ling, “Feature pyramid and hierarchical boosting network for pavement crack detection,” IEEE Transactions on Intelligent Transportation Systems, vol. 21, no. 4, pp. 1525–1535, 2019.

[31]S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, “Cbam: Convolutional block attention module,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 3–19.

[32]J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 3431–3440.

[33]V. Badrinarayanan, A. Kendall, and R. Cipolla, “Segnet: A deep convolutional encoder-decoder architecture for image segmentation,” IEEE transactions on pattern analysis and machine intelligence, vol. 39, no. 12, pp. 2481–2495, 2017.

[34]A. Paszke, A. Chaurasia, S. Kim, and E. Culurciello, “Enet: A deep neural network architecture for real-time semantic segmentation,” arXiv preprint arXiv:1606.02147, 2016.

[35]Q. Zou, Z. Zhang, Q. Li, X. Qi, Q. Wang, and S. Wang, “Deepcrack: Learning hierarchical convolutional features for crack detection,” IEEE Transactions on Image Processing, vol. 28, no. 3, pp. 1498–1512, 2018.

[36]X. Qin, Z. Zhang, C. Huang, M. Dehghan, O. R. Zaiane, and M. Jagersand, “U2-net: Going deeper with nested u-structure for salient object detection,” Pattern Recognition, vol. 106, p. 107404, 2020.

[37]M. Eisenbach, R. Stricker, D. Seichter, K. Amende, K. Debes, M. Sesselmann, D. Ebersbach, U. Stoeckert, and H.-M. Gross, “How to get pavement distress detection ready for deep learning? a systematic approach,” in 2017 international joint conference on neural networks (IJCNN). 1em plus 0.5em minus 0.4em IEEE, 2017, pp. 2039–2047.

[38]K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.

[39]T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2117–2125.

[40]O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention. 1em plus 0.5em minus 0.4em Springer, 2015, pp. 234–241.

[41]M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, “The pascal visual object classes (voc) challenge,” International journal of computer vision, vol. 88, no. 2, pp. 303–338, 2010.

[42]R. Margolin, L. Zelnik-Manor, and A. Tal, “How to evaluate foreground maps?” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 248–255.

[43]N. Liu, J. Han, and M.-H. Yang, “Picanet: Learning pixel-wise contextual attention for saliency detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 3089–3098.