AI-powered Predictive Model for Stroke and Diabetes Diagnostic

Full Text (PDF, 1883KB), PP.24-40

Views: 0 Downloads: 0

Author(s)

Ngoc-Bich Le 1,* Thi-Thu-Hien Pham 1 Sy-Hoang Nguyen 1 Nhat-Minh Nguyen 1 Tan-Nhu Nguyen 1

1. School of Biomedical Engineering, International University, Vietnam National University HCM City, HCM City, Vietnam

* Corresponding author.

DOI: https://doi.org/10.5815/ijisa.2024.01.03

Received: 29 Nov. 2023 / Revised: 31 Dec. 2023 / Accepted: 5 Jan. 2024 / Published: 8 Feb. 2024

Index Terms

Machine Learning, Predictive Model, Stroke Diagnosis, Diabetes Diagnosis, XGBoost Classifier (XGB), Random Forest (RF), Decision Tree (DT), K-Nearest Neighbors (KNN), Logistic Regression

Abstract

Research efforts in the prediction of stroke and diabetes prioritize early detection in order to enhance patient outcomes. To achieve this, a variety of methodologies are integrated. Existing studies, on the other hand, are marred by imbalanced datasets, lack of diversity in their datasets, potential bias, and inadequate model comparisons; these flaws underscore the necessity for more comprehensive and inclusive research methodologies. This paper provides a thorough assessment of machine learning algorithms in the context of early detection and diagnosis of stroke and diabetes. The research employed widely used algorithms, including Logistic Regression, Decision Tree, Random Forest, K-Nearest Neighbors (KNN), and XGBoost Classifier, to examine medical data and derive significant findings. The XGBoost Classifier demonstrated superior performance, with an outstanding accuracy, precision, recall, and F1-score of 87.5%. The comparative examination of the algorithms indicated that the Decision Tree, Random Forest, and XGBoost classifiers consistently exhibited strong performance across all measures. The models demonstrated impressive discrimination capabilities, with the XGBoost Classifier and Random Forest reaching accuracy rates of roughly 87.5% and 86.5% respectively. The Decision Tree Classifier exhibited notable performance, with an accuracy rate of 83%. The overall accuracy of the models was evident in the F1-score, a metric that incorporates recall and precision, where the XGBoost model exhibited a marginal improvement of 2% over the Random Forest and Decision Tree models, and 4.25 percent over the last two. The aforementioned results underscore the effectiveness of the XGBoost Classifier, which will be employed as a predictive model in this study, alongside the Random Forest and Decision Tree models, for the accurate identification of stroke and diabetes. Furthermore, combining datasets improves model performance by utilizing relative features. This integrated dataset improves the model's efficiency and creates a resilient and comprehensive prediction model, improving healthcare outcomes. The findings of this research make a valuable contribution to the advancement of AI-driven diagnostic systems, hence enhancing the quality of healthcare decision-making.

Cite This Paper

Ngoc-Bich Le, Thi-Thu-Hien Pham, Sy-Hoang Nguyen, Nhat-Minh Nguyen, Tan-Nhu Nguyen, " AI-powered Predictive Model for Stroke and Diabetes Diagnostic", International Journal of Intelligent Systems and Applications(IJISA), Vol.16, No.1, pp.24-40, 2024. DOI:10.5815/ijisa.2024.01.03

Reference

[1]Murphy S. J., Werring D. J., “Stroke: causes and clinical features”, Medicine (Abingdon), Vol. 48, No. 9, pp. 561-566, ISSN 1357-3039, 2020. DOI: 10.1016/j.mpmed.2020.06.002.
[2]Feigin V. L., Brainin M., Norrving B., “World Stroke Organization (WSO): Global Stroke Fact Sheet 2022”, Int J Stroke., Vol. 17, No. 4, pp. 18-29, 2022. DOI:10.1177/17474930211065917.
[3]Mujeeb Z. B., Aga S. S., Saniya N., “Pathophysiology of diabetes: An overview”, Avicenna J Med, Vol. 10, No. 04, pp. 174-188, 2020. DOI: 10.4103/ajm.ajm_53_20.
[4]William H. H., “Diabetes Mellitus in Developing Countries and Underserved Communities”, Springer Cham, ISBN : 978-3-319-41557-4, 2017. DOI:10.1007/978-3-319-41559-8.
[5]Jonathan H., Luis C. G., María del C. F., Cristina S., "Diabetes and Stroke Prevention: A Review", Stroke Research and Treatment, Vol. 2012, Article ID 673187, 6 pages, 2012. DOI:10.1155/2012/673187.
[6]Mandeep K., Sachin R. S., Kirti W., Farzana A., "Early Stroke Prediction Methods for Prevention of Strokes", Behavioural Neurology, Vol. 2022, Article ID 7725597, 9 pages, 2022. DOI: 10.1155/2022/7725597.
[7]Dev S., Wang H., Nwosu C. S., Jain N., Veeravalli B., John D., “A predictive analytics approach for stroke prediction using machine learning and neural networks”, Healthcare Analytics, Vol. 2, 100032, 2022. DOI:10.1016/j.health.2022.100032.
[8]Al-Mekhlafi Z. G., Senan E. M., Rassem T. H., Mohammed B. A., Makbol N. M., Alanazi A. A., Almurayziq T. S. and Ghaleb F. A., “Deep Learning and Machine Learning for Early Detection of Stroke and Haemorrhage”, Computers, Materials and Continua, Vol. 72, No. 1, pp. 775 - 796., 2022. DOI:10.32604/cmc.2022.024492.
[9]Emon M. U., Keya M. S., Meghla T. I., Rahman M. M., Mamun M. S. A. and Kaiser M. S., "Performance Analysis of Machine Learning Approaches in Stroke Prediction," 2020 4th International Conference on Electronics, Communication and Aerospace Technology (ICECA), Coimbatore, India, 2020, pp. 1464-1469. DOI: 10.1109/ICECA49313.2020.9297525.
[10]Sirsat M. S., Fermé E., Câmara J., “Machine Learning for Brain Stroke: A Review”, J Stroke Cerebrovasc Dis, Vol. 29, No. 10, 105162. DOI: 10.1016/j.jstrokecerebrovasdis.
[11]Dritsas E., Trigka M., “Stroke Risk Prediction with Machine Learning Techniques”, Sensors (Basel), Vol. 21, No. 13, pp. 4670, 2022. DOI: 10.3390/s22134670.
[12]Chowdary M. K., Kumar K. A., Ganesh C., Turaka R., Rao B. D. and Naik S. L., "Multiple Disease Prediction by Applying Machine Learning and Deep Learning Algorithms," 2023 7th International Conference on Intelligent Computing and Control Systems (ICICCS), Madurai, India, 2023, pp. 502-510. DOI: 10.1109/ICICCS56967.2023.10142766.
[13]Vijayan V. V. and Anjali C., "Prediction and diagnosis of diabetes mellitus — A machine learning approach," 2015 IEEE Recent Advances in Intelligent Computational Systems (RAICS), Trivandrum, India, pp. 122-127, 2015. DOI: 10.1109/RAICS.2015.7488400.
[14]Sonar P. and JayaMalini K., "Diabetes Prediction Using Different Machine Learning Approaches," 2019 3rd International Conference on Computing Methodologies and Communication (ICCMC), Erode, India, pp. 367-371, 2019. DOI: 10.1109/ICCMC.2019.8819841.
[15]Dagliati A., Marini S., Sacchi L., Cogni G., Teliti M., Tibollo V., De Cata P., Chiovato L., Bellazzi R., “Machine Learning Methods to Predict Diabetes Complications”, J Diabetes Sci Technol., Vol. 12, No.2, pp. 295-302, 2018. DOI: 10.1177/1932296817706375.
[16]Stroke Prediction Dataset. Available online: https://www.kaggle.com/code/prasadshingare/diabetes-hypertension-and-stroke-prediction/data?select=stroke_data.csv (accessed on 06 July 2023).
[17]Diabetes Prediction Dataset. Available online: https://www.kaggle.com/code/prasadshingare/diabetes-hypertension-and-stroke-prediction/data?select=diabetes_data.csv (accessed on 06 July 2023).
[18]https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.info.html, (accessed on 06 July 2023)
[19]Dziuban, C. D., Shirkey, E. C., “When is a correlation matrix appropriate for factor analysis? Some decision rules”, Psychological Bulletin, Vol. 81, No.6, pp. 358–361, 1974. DOI:10.1037/H0036316.
[20]https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html (accessed on 22nd Nov 2023).
[21]LaValley M. P., “Logistic regression”, Circulation, Vol. 117, No. 18, pp. 2395–2399, 2008. DOI: 10.1161/CIRCULATIONAHA.106.682658.
[22]Breiman L., Friedman J., Stone C. J., and Olshen R. A., “Classificationalgorithms and regression trees”, Taylor & Francis, ISBN: 0412048418, 1984.
[23]Breiman L., “Random Forests”, Machine Learning , Vol 45, pp. 5–32, 2001. DOI: 10.1023/A:1010933404324.
[24]Kramer O., “K-Nearest Neighbors”, In: Dimensionality Reduction with Unsupervised Nearest Neighbors, Intelligent Systems Reference Library, Vol 51, 2013. DOI: 10.1007/978-3-642-38652-7_2.
[25]Chen T., Guestrin C., “XGBoost”, in: Proceedings of the 22ndACM SIGKDD International Conference on Knowledge Discov-ery and Data Mining, ACM, New York, USA, 2016. DOI:10.1145/2939672.2939785.
[26]Hossin M., Sulaiman M. N., “A review on evaluation metrics for data classification evaluations”, International journal of data mining & knowledge management process, Vol.5, No.2, 2015. DOI: 10.5121/ijdkp.2015.5201.
[27]Trailokya Raj Ojha, Ashish Kumar Jha, "Analyzing the Performance of the Machine Learning Algorithms for Stroke Detection", International Journal of Education and Management Engineering, Vol.13, No.2, pp. 27-35, 2023.
[28]Chauhan R., Goel A., Kaur H., Alankar B., “Machine Learning: An Analytical Approach for Pattern Detection in Diabetes”, In: Kumar, R., Verma, A.K., Sharma, T.K., Verma, O.P., Sharma, S. (eds) Soft Computing: Theories and Applications, Lecture Notes in Networks and Systems, Vol. 627, Springer, Singapore, 2023. DOI: 10.1007/978-981-19-9858-4_12.