Self-supervised Model Based on Masked Autoencoders Advance CT Scans Classification

Full Text (PDF, 576KB), PP.1-9

Views: 0 Downloads: 0


Jiashu Xu 1,* Sergii Stirenko 1

1. National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Kyiv, 03056, Ukraine

* Corresponding author.


Received: 28 May 2022 / Revised: 13 Jun. 2022 / Accepted: 29 Jul. 2022 / Published: 8 Oct. 2022

Index Terms

Self-supervised learning, CT scans, Transfer Learning, Classification


The coronavirus pandemic has been going on since the year 2019, and the trend is still not abating. Therefore, it is particularly important to classify medical CT scans to assist in medical diagnosis. At present, Supervised Deep Learning algorithms have made a great success in the classification task of medical CT scans, but medical image datasets often require professional image annotation, and many research datasets are not publicly available. To solve this problem, this paper is inspired by the self-supervised learning algorithm MAE and uses the MAE model pre-trained on ImageNet to perform transfer learning on CT Scans dataset. This method improves the generalization performance of the model and avoids the risk of overfitting on small datasets. Through extensive experiments on the COVID-CT dataset and the SARS-CoV-2 dataset, we compare the SSL-based method in this paper with other state-of-the-art supervised learning-based pretraining methods. Experimental results show that our method improves the generalization performance of the model more effectively and avoids the risk of overfitting on small datasets. The model achieved almost the same accuracy as supervised learning on both test datasets. Finally, ablation experiments aim to fully demonstrate the effectiveness of our method and how it works.

Cite This Paper

Jiashu Xu, Sergii Stirenko, " Self-Supervised Model Based on Masked Autoencoders Advance CT Scans Classification", International Journal of Image, Graphics and Signal Processing(IJIGSP), Vol.14, No.5, pp. 1-9, 2022. DOI:10.5815/ijigsp.2022.05.01


[1]He K, Chen X, Xie S, Li Y, Dollár P, Girshick R. Masked autoencoders are scalable vision learners. arXiv preprint arXiv:2111.06377. 2021 Nov 11.
[2]Cho, K., Van Merriënboer, B., Bahdanau, D. and Bengio, Y., 2014. On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:1409.1259.
[3]Torrey, Lisa, and Jude Shavlik. "Transfer learning." In Handbook of research on machine learning applications and trends: algorithms, methods, and techniques, pp. 242-264. IGI global, 2010.
[4]Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. 2018 Oct 11.
[5]Wang, Linda and Lin, Zhong Qiu and Wong, Alexander, COVID-Net: a tailored deep convolutional neural network design for detection of COVID-19 cases from chest X-ray images, Scientific Reports, 2020
[6]S. A. Harmon, T. H. Sanford, S. Xu, E. B. Turkbey, H. Roth, Z. Xu, D. Yang, A. Myronenko, V. Anderson, A. Amalou et al., “Artificial intelligence for the detection of covid-19 pneumonia on chest ct using multinational datasets,” Nature Communications, vol. 11, no. 1, pp. 1–7,2020.
[7]Ravi, V., Narasimhan, H., Chakraborty, C. et al. Deep learning-based meta-classifier approach for COVID-19 classification using CT scan and chest X-ray images. Multimedia Systems (2021).
[8]L. Sun, Z. Mo, F. Yan, L. Xia, F. Shan, Z. Ding, B. Song, W. Gao,W. Shao, F. Shi et al., “Adaptive feature selection guided deep forest for covid-19 classification with chest ct,” IEEE Journal of Biomedical and Health Informatics, 2020.
[9]Sriram, A., Muckley, M., Sinha, K., Shamout, F., Pineau, J., Geras, K. J., ... & Moore, W. (2021). COVID-19 Prognosis via Self-Supervised Representation Learning and Multi-Image Prediction. E-print arXiv:2101.04909,2021.
[10]He, K., Fan, H., Wu, Y., Xie, S., & Girshick, R. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 9729-9738. 2020.
[11]Jeremy Irvin, Pranav Rajpurkar, Michael Ko, Yifan Yu, Silviana Ciurea-Ilcus, Chris Chute, Henrik Marklund, Behzad Haghgoo, Robyn Ball, Katie Shpanskaya, Jayne Seekins, David A. Mong, Safwan S. Halabi, Jesse K. Sandberg, Ricky Jones, David B. Larson, Curtis P. Langlotz, Bhavik N. Patel, Matthew P. Lungren, and Andrew Y. Ng. CheXpert: A large chest radiograph dataset with uncertainty labels and expert comparison. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 590–597, 2019.
[12]Alistair EW Johnson, Tom J Pollard, Seth J Berkowitz, Nathaniel R Greenbaum, Matthew P Lungren, Chih-ying Deng, Roger G Mark, and Steven Horng. MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Scientific Data, 6:317, 2019.
[13]He, Xuehai. "Sample-efficient deep learning for COVID-19 diagnosis based on CT scans." IEEE transactions on medical imaging (2020).
[14]Fung, D.L.X., Liu, Q., Zammit, J. et al. Self-supervised deep learning model for COVID-19 lung CT image segmentation highlighting putative causal relationship among age, underlying disease and COVID-19. J Transl Med 19, 318 (2021).
[15]Ronneberger, Olaf, Philipp Fischer, and Thomas Brox. "U-net: Convolutional networks for biomedical image segmentation." In International Conference on Medical image computing and computer-assisted intervention, pp. 234-241. Springer, Cham, 2015.
[16]Tan, Mingxing, and Quoc Le. "Efficientnet: Rethinking model scaling for convolutional neural networks." In International conference on machine learning, pp. 6105-6114. PMLR, 2019.
[17]Xu, Jiashu. "A Review of Self-supervised Learning Methods in the Field of Medical Image Analysis." International Journal of Image, Graphics and Signal Processing (IJIGSP) 13.4 (2021): 33-46.
[18]S. Gidaris, P. Singh, N. Komodakis.” Unsupervised Representation Learning by Predicting Image Rotations,” Proceedings of the International Conference on Learning Representations, 2018.
[19]M. Noroozi, P. Favaro. “Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles,” In European conference on computer vision, 2016, pp: 69-84.
[20]Zhou, Jinghao, Chen Wei, Huiyu Wang, Wei Shen, Cihang Xie, Alan Yuille, and Tao Kong. "ibot: Image bert pre-training with online tokenizer." arXiv preprint arXiv:2111.07832 (2021)
[21]Dosovitskiy, Alexey, et al. "An image is worth 16x16 words: Transformers for image recognition at scale." arXiv preprint arXiv:2010.11929 (2020).
[22]Radford, Alec, Luke Metz, and Soumith Chintala. "Unsupervised representation learning with deep convolutional generative adversarial networks." arXiv preprint arXiv:1511.06434 (2015).
[23]Zhu, Yizhe, et al. "S3vae: Self-supervised sequential vae for representation disentanglement and data generation." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020.
[24]Vaswani, A., Shazeer, N.M., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., & Polosukhin, I. (2017). Attention is All you Need. ArXiv, abs/1706.03762.
[25]Dosovitskiy, Alexey, et al. "An image is worth 16x16 words: Transformers for image recognition at scale." arXiv preprint arXiv:2010.11929 (2020).
[26]Deng, Jia, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. "Imagenet: A large-scale hierarchical image database." In 2009 IEEE conference on com.
[27]Nguyen, D., Kay, F., Tan, J., Yan, Y., Ng, Y. S., Iyengar, P., ... & Jiang, S. (2021). Deep learning–based COVID-19 pneumonia classification using chest CT images: model generalizability. Frontiers in Artificial Intelligence, 4.
[28]Zhao J, Zhang Y, He X, Xie P. Covid-ct-dataset: a ct scan dataset about covid-19. arXiv preprint arXiv:2003.13865. 2020 Jun;490.
[29]Angelov, Plamen, and Eduardo Almeida Soares. "SARS-CoV-2 CT-scan dataset: A large dataset of real patients CT scans for SARS-CoV-2 identification." MedRxiv (2020).
[30]Wu, Z., Shen, C., & Van Den Hengel, A. (2019). Wider or deeper: Revisiting the resnet model for visual recognition. Pattern Recognition, 90, 119-133.
[31]Iandola F, Moskewicz M, Karayev S, Girshick R, Darrell T, Keutzer K. Densenet: Implementing efficient convnet descriptor pyramids. arXiv preprint arXiv:1404.1869. 2014 Apr 7.
[32]Sengupta, A., Ye, Y., Wang, R., Liu, C. and Roy, K., 2019. Going deeper in spiking neural networks: VGG and residual architectures. Frontiers in neuroscience, 13, p.95.
[33]Tan, Mingxing, and Quoc Le. "Efficientnet: Rethinking model scaling for convolutional neural networks." In International conference on machine learning, pp. 6105-6114. PMLR, 2019.
[34]Yang, Ya-Hu, Jia-Shu Xu, Yuri Gordienko, and Sergii Stirenko. "Abnormal Interference Recognition Based on Rolling Prediction Average Algorithm." In International Conference on Computer Science, Engineering and Education Applications, pp. 306-316. Springer, Cham, 2020.
[35]Stirenko, Sergii, et al. "Chest X-ray analysis of tuberculosis by deep learning with segmentation and augmentation." 2018 IEEE 38th International Conference on Electronics and Nanotechnology (ELNANO). IEEE, 2018.
[36]Gordienko, Y., Gang, P., Hui, J., Zeng, W., Kochura, Y., Alienin, O., & Stirenko, S. (2018, January). Deep learning with lung segmentation and bone shadow exclusion techniques for chest X-ray analysis of lung cancer. In International conference on computer science, engineering and education applications (pp. 638-647). Springer, Cham.
[37]Jasman Pardede, Benhard Sitohang, Saiful Akbar, Masayu Leylia Khodra, "Implementation of Transfer Learning Using VGG16 on Fruit Ripeness Detection", International Journal of Intelligent Systems and Applications(IJISA), Vol.13, No.2, pp.52-61, 2021. DOI: 10.5815/ijisa.2021.02.04