Exploring Feature Selection and Machine Learning Algorithms for Predicting Diabetes Disease

Full Text (PDF, 906KB), PP.1-10

Views: 0 Downloads: 0


Eman I. Abd El-Latif 1,* Islam A. Moneim 1

1. Department of Mathematics and Computer Science, Faculty of Science, Benha University, Benha, Egypt

* Corresponding author.

DOI: https://doi.org/10.5815/ijisa.2024.01.01

Received: 24 Aug. 2023 / Revised: 27 Sep. 2023 / Accepted: 20 Oct. 2023 / Published: 8 Feb. 2024

Index Terms

Diabetes, Mutual Information, Pearson Correlation and Chi-squared


One of the most common diseases in the world is the chronic diabetes. Diabetes has a direct impact on the lives of millions of people worldwide. Diabetes can be controlled and improved with early diagnosis, but the majority of patients continue to live with it. There is a dispirit need to a system to anticipate and select the people who are most likely to be diabetes in the future. Diagnosing the future diseased person without taking any blood or glucose screening tests, is the main goal of this study. This paper proposed a deep-learning model for diabetes disease prediction. The proposed model consists of three main phases, data pre-processing, feature selection and finally different classifiers. Initially, during the data pre-processing stage, missing values are handled, and data normalization is applied to the data. Then, three techniques are used to select the most important features which are mutual information, chi-squared and Pearson correlation. After that, multiple machine learning classifiers are used. Four experiments are then conducted to test our models. Additionally, the effectiveness of the proposed model is evaluated against that of other well-known machine learning techniques. The accuracy, AUC, sensitivity, and F-measure of the linear regression classifier are higher than those of the other methods, according to experimental data, which show that it performs better. The suggested model worked better than traditional methods and had a high accuracy rate for predicting diabetic disease.

Cite This Paper

Eman I. Abd El-Latif, Islam A. Moneim, "Exploring Feature Selection and Machine Learning Algorithms for Predicting Diabetes Disease", International Journal of Intelligent Systems and Applications(IJISA), Vol.16, No.1, pp.1-10, 2024. DOI:10.5815/ijisa.2024.01.01


[1]Krasteva, A., Panov, V., Krasteva, A., Kisselova, A., and Krastev, Z. Oral cavity and systemic diseases—Diabetes Mellitus. Biotechnol. Biotechnol. Equip. 25, 2183–2186, 2011. doi: 10.5504/BBEQ.2011.0022
[2]Wang, Andrea N., et al. "Zucker Diabetic‐Sprague Dawley (ZDSD) rat: Type 2 diabetes translational research model." Experimental Physiology 107.4, 2022: 265-282.
[3]Lonappan A, Bindu G, Thomas V, Jacob J, Rajasekaran C, Mathew KT. Diagnosis of diabetes mellitus using microwaves. J Electromagn Waves Appl. 2007;21(10):1393–401
[4]Lee, B. J., and Kim, J. Y.. Identification of type 2 diabetes risk factors using phenotypes consisting of anthropometry and triglycerides based on machine learning. IEEE J. Biomed. Health Inform. 20, 39–46, 2016. doi: 10.1109/JBHI.2015.2396520
[5]Echegoyen, Francisco X. Barrera, et al. "The nature and characteristics of hypertriglyceridemia in a large cohort with type 2 diabetes." Journal of diabetes and its complications 37.2, 2023: 108387.
[6]Tuppad, Ashwini, and Shantala Devi Patil. "Machine learning for diabetes clinical decision support: a review." Advances in Computational Intelligence 2.2, 2022: 22.
[7]Kavakiotis, I., Tsave, O., Salifoglou, A., Maglaveras, N., Vlahavas, I., and Chouvarda, I.. Machine learning and data mining methods in diabetes research. Comput. Struct. Biotechnol. J. 15, 104–116, 2017. doi: 10.1016/j.csbj.2016.12.005
[8]Kavakiotis, I., Tsave, O., Salifoglou, A., Maglaveras, N., Vlahavas, I., and Chouvarda, I.. Machine learning and data mining methods in diabetes research. Comput. Struct. Biotechnol. J. 15, 104–116, 2017. doi: 10.1016/j.csbj.2016.12.005
[9]Polat, K., and Günes, S.. An expert system approach based on principal component analysis and adaptive neuro-fuzzy inference system to diagnosis of diabetes disease. Digit. Signal Process. 17, 702–710, 2007.
[10]Yue, C., Xin, L., Kewen, X., and Chang, S.. “An intelligent diagnosis to type 2 diabetes based on QPSO algorithm and WLS-SVM,” in Proceedings of the 2008 IEEE International Symposium on Intelligent Information Technology Application Workshops, Washington, DC, 2008.
[11]Razavian, N., Blecker, S., Schmidt, A. M., Smith-McLallen, A., Nigam, S., and Sontag, D.. Population-level prediction of type 2 diabetes from claims data and analysis of risk factors. Big Data 3, 277–287, 2015.
[12]Duygu,ç., and Esin, D.. An automatic diabetes diagnosis system based on LDA-wavelet support vector machine classifier. Expert Syst. Appl. 38, 8311–8315, 2011.
[13]Georga, E. I., Protopappas, V. C., Ardigo, D., Marina, M., Zavaroni, I., Polyzos, D., et al. Multivariate prediction of subcutaneous glucose concentration in type 1 diabetes patients based on support vector regression. IEEE J. Biomed. Health Inform. 17, 71–81, 2013. doi: 10.1109/TITB.2012.2219876
[14]Ozcift, A., and Gulten, A.. Classifier ensemble construction with rotation forest to improve medical diagnosis performance of machine learning algorithms. Comput. Methods Programs Biomed. 104, 443–451, 2011. doi: 10.1016/j.cmpb.2011.03.018
[15]Zou, Q., Qu, K., Luo, Y., Yin, D., Ju, Y., & Tang, H.. Predicting diabetes mellitus with machine learning techniques. Frontiers in genetics, 9, 515, 2018.
[16]Maniruzzaman, Md, et al. "Classification and prediction of diabetes disease using machine learning paradigm." Health information science and systems 8, 2020: 1-14.
[17]Malik, Sarul, et al. "Non-invasive detection of fasting blood glucose level via electrochemical measurement of saliva." Springerplus 5, 2016: 1-12.
[18]Farran, Bassam, et al. "Predictive models to assess risk of type 2 diabetes, hypertension and comorbidity: machine-learning algorithms and validation using national health data from Kuwait—a cohort study." BMJ open 3.5, 2013.
[19]Tapak, Lily, et al. "Real-data comparison of data mining methods in prediction of diabetes in Iran." Healthcare informatics research 19.3, 2013: 177-185.
[21]Sengupta, Debapriya, Phalguni Gupta, and Arindam Biswas. "A survey on mutual information based medical image registration algorithms." Neurocomputing 486, 2022: 174-188.
[22]Su, Xiangchenyang, and Fang Liu. "A survey for study of feature selection based on mutual information." 2018 9th workshop on hyperspectral image and signal processing: evolution in remote sensing (WHISPERS). IEEE, 2018.
[23]Vashisht, Manisha, and Brijesh Kumar. "Traffic Sign Recognition Approach Using Artificial Neural Network and Chi-Squared Feature Selection." Next Generation of Internet of Things: Proceedings of ICNGIoT 2022. Singapore: Springer Nature Singapore, 2022. 519-527.
[24]Hort, Max, et al. "Bia mitigation for machine learning classifiers: A comprehensive survey." arXiv preprint arXiv:2207.07068, 2022.
[25]Priyanka, and Dharmender Kumar. "Decision tree classifier: a detailed survey." International Journal of Information and Decision Sciences 12.3, 2020: 246-269.
[26]Nanfack, Géraldin, Paul Temple, and Benoît Frénay. "Constraint Enforcement on Decision Trees: A Survey." ACM Computing Surveys (CSUR) 54.10s, 2022: 1-36.
[27]Shaik, Anjaneyulu Babu, and Sujatha Srinivasan. "A brief survey on random forest ensembles in classification model." International Conference on Innovative Computing and Communications: Proceedings of ICICC 2018, Volume 2. Springer Singapore, 2019.
[28]Cunningham, Padraig, and Sarah Jane Delany. "k-Nearest neighbour classifiers-A Tutorial." ACM computing surveys (CSUR) 54.6, 2021: 1-25.