Development of a Prediction Model on Demographic Indicators based on Machine Learning Methods: Azerbaijan Example

Full Text (PDF, 495KB), PP.1-9

Views: 0 Downloads: 0

Author(s)

Makrufa Sh. Hajirahimova 1 Aybeniz S. Aliyeva 1,*

1. Institute of Information Technology of the Ministry of Science and Education of the Republic of Azerbaijan, Baku, AZ1141, Azerbaijan

* Corresponding author.

DOI: https://doi.org/10.5815/ijeme.2023.02.01

Received: 13 Dec. 2022 / Revised: 2 Jan. 2023 / Accepted: 15 Feb. 2023 / Published: 8 Apr. 2023

Index Terms

Time series forecasting, population prediction, machine learning, linear regression, decision tree, random forest, k-nearest neighbors

Abstract

The accuracy of population forecasts is one of the most important calculations in demography statistics. However, traditional demographic methods used in population projections are tend to produce biased results. The need for accurate prediction of future behavior in a number of areas require the application of reliable and efficient methods. Recently, machine learning (ML) models have emerged as a serious competitor to classical statistical models in the forecasting community. In this study, the performance and capacity of the four different ML models such as Random forest (RF), Decision tree (DT), Linear regression (LR) and K-nearest neighbors (KNN) to the prediction of population has been examined. The aim of the study is to find the best performing regression model among these machine learning algorithms for forecasting of population. The data were collected from the State Statistical Committee of the Republic of Azerbaijan website were used for the analysis. We used five metrics such as mean absolute percentage error (MAPE), mean absolute error (MAE), root mean squared error (RMSE), mean square error (MSE) and R-squared to compare the predictive ability of the models. As the result of the analysis, it has been known that the all ML models showed high results with correlation coefficient of 0.985 - 0.996. Also the KNN and RF prediction models showed the lowest root mean square deviation, means square error and mean absolute error values compared to other models. By effectively using the advantage of the ML algorithms, the forecast of population growth the near future can be observed objectively, and it can provide an objective reference to the strategic planning in the public and private sectors, particularly in education, health and social areas.

Cite This Paper

Makrufa Sh. Hajirahimova, Aybeniz S. Aliyeva, "Development of a Prediction Model on Demographic Indicators based on Machine Learning Methods: Azerbaijan Example", International Journal of Education and Management Engineering (IJEME), Vol.13, No.2, pp. 1-9, 2023. DOI:10.5815/ijeme.2023.02.01

Reference

[1]World population prospects: Summary of Results. New York, United Nations, 2022. Available at: https://www.un.org.development.desa.pd/wpp2022_summary_of_results.pdf
[2]A.E. Raftery and H. Ševčíková, “Probabilistic population forecasting: Short to very long-term,” International Journal of Forecasting, 7 October 2021. Available at: https://doi.org/10.1016/j.ijforecast.2021.09.001
[3]J. Vespa, D. M. Armstrong, and L. Medina, “Demographic turning points for the United States: population projections for 2020 to 2060: Current population reports P25-1144,” Washington, D.C.: U.S. Census Bureau, 2020.
[4]National Institute of Population and Social Security Research. Population projections for Japan (2016–2065), 2017. Available at: http://www.ipss.go.jp/pp-zenkoku/e/zenkoku_e2017/
[5]Social Security Administration. The 2020 annual report of the board of trustees of the federal old-age and survivors’ insurance and federal disability insurance trust funds, 2020. Available at: https://www.ssa.gov/oact/TR/2020/tr2020.pdf.
[6]R. Allahverdiyev and Kh. Nasibov, “Aspects of methodological approach to forecasting of population size,” 2017. Available at: http://etsim.az/upload/Image/2017-1/
[7]A. M. Abbasov and M.H. Mamedova, “Application of fuzzy time series to population forecasting,” in Proceedings of the 8th International Symposium on ICT and planing and impacts of ICT on Physical Space, Vienna, 2003, pp. 545-552.
[8]Z. G. Jabrayilova, “Development of intelligent demographic forecasting system, “, Eastern-European Journal of Enterprise Technologies, vol.5, no.2(101), pp.18-25, 2019. Available at: DOI:10.15587/1729-4061.2019.178440
[9]P. Singh, “High-order fuzzy-neuro-entropy integration-based expert system for time series forecasting”, Neural Comput. Appl., vol. 28, pp. 3851–3868, 2017.
[10]S. Tajmouati, B. Wahbi, A. Bedoui, A. Abarda, and M. Dakkon, “Applying k-nearest neighbors to time series forecasting: two new approaches,” arXiv:2103.14200v1 [stat.ME] 26 Mar 2021, pp. 1-20.
[11]M.M. Otoom, M. Jemmali, Y. Qawqzeh, S. A. Khalid Nazim, and F. Al Fayez, “Comparative analysis of different machine learning models for estimating the population growth rate in data-limited area,” IJCSNS International Journal of Computer Science and Network Security, vol.19, no.12, pp. 96-101, December 2019.
[12]G. Ognjanovski, “Predict Population Growth Using Linear Regression - Machine Learning Easy and Fun,” 4th December 2018. Available at: https://medium,com/analytics-vidhya/
[13]O. D. Odunayo, O. E. Oduntan, and I. R. Olasunkanmi, “Using predictive Machine Learning Regression Model to predict the population of Nigeria,” Annals. Computer Science Series, vol. 16, no. 2, pp. 137-142, 2018.
[14]Dr. N. Ashioba and N. N. Daniel, “Population Forecasting System Using Machine Learning Algorithm,” International Journal of Computer Trends and Technology, vol. 68, no. 12, pp. 40-43, December 2020.
[15]V. S. Fatih, T.T. Ahmet, and C. Ferhan, “Machine Learning algorithm to forecast the population: Turkey Example,” in Proceedings of International Engineering and Technology Management Summit 2019 – ETMS. Available at: www.researchgate.net/publication/33714439
[16]S. B. Rajakumari, P. Padmanabhan, S. Christy, and M. Nandhini, “Prediction of population growth using machine learning techniques,” European Journal of Molecular & Clinical Medicine, vol.7, no. 5, pp. 1871-1879, 2020.
[17]E. A. Rady, H. Fawzy, and A.M. Abdel Fattah, “Time Series Forecasting Using Tree Based Methods,” Journal of Statistics Applications & Probability, vol.10, no. 1, pp. 229-244, 2021. Available at: http://dx.doi.org/10.18576/jsap/100121
[18]M. M. Otoom, “Comparing the Performance of 17 Machine Learning Models in predicting Human Population Growth of Countries,” International Journal of Computer Science and Network Security, vol. 21, no.1, January 2021.
[19]F.V. Şahinarslan, A.T. Tekin, and F. Çebi, “Application of machine learning algorithms for population forecasting,” International Journal of Data Science, vol. 6, no. 4, pp. 257–270, 2021.
[20]C.Y. Wang and S.J. Lee, “Regional Population Forecast and Analysis Based on Machine Learning Strategy,” Entropy, vol. 23, no. 656, pp. 1-12, 2021. Available at: https://doi.org/10.3390/e23060656
[21]O. Folorunso, A. Akinwale, O. Asiribo, and T. Adeyemo, “Population prediction using artificial neural network,” African Journal of Mathematics and Computer Science Research, vol. 3, no. 8, pp. 155–162, 2010.
[22]V. Riiman, A. Wilsony, R. Milewiczz, and P. Pirkelbauerx, “Comparing Articial Neural Network and Cohort-Component Models for Population Forecasts,” Population review, vol. 58, no. 2, pp.100-116, 2019.
[23]www.stat.gov.az, State Statistical Committee of the Republic of Azerbaijan, 2022.
[24]F. Rustam, A. A. Reshi, A. Mehmood, S. Ullah, B.-Won On, W. Aslam, and G. S. Choi, “COVID-19 Future Forecasting Using Supervised Machine Learning Models,” IEEE Access, vol.8, pp.101489-101499, 2020.
[25]M. Sh. Hajirahimova, L. R. Yusifova, “Experimental Study of Machine Learning Methods in Anomaly Detection,” Problems of Information Technology, vol. 13, no. 1, pp. 9-19, 2022.
[26]What is a Decision Tree? Available at: https://www.ibm.com/topics/decision-trees
[27]M. P. Frías, M. D. Pérez, and A. J. Rivera, “A methodology for applying k-nearest neighbor to time series forecasting,” Artificial Intelligence Review, vol.52, no. 3, 2019.
[28]J. Brownlee, “Random Forest for Time Series Forecasting,” November 2, 2020, Available at: https://machinelearningmastery.com/random-forest-for-time-series-forecasting/].
[29]M. Sh. Hajirahimova and A. S. Aliyeva, “Forecasting the COVID-19 confirmed cases and deaths in Azerbaijan using Prophet,” National Supercomputing Forum (NSCF-2021), Russia, Pereslavl-Zalessky, Program Systems Institute of the RAS, November 30 – December 03, 2021, Available at: https://2021.nscf.ru/TesisAll/05_AI_MachineLearning/250_AliyevaAS.pdf
[30]V. Verma, R. K. Aggarwal, "Accuracy Assessment of Similarity Measures in Collaborative Recommendations Using CF4J Framework", International Journal of Modern Education and Computer Science (IJMECS), vol.11, no.5, pp. 41-53, 2019. DOI: 10.5815/ijmecs.2019.05.05
[31]B. Nazlı, Y. Gültepe, and H. Altural, "Classification of Coronary Artery Disease Using Different Machine Learning Algorithms," International Journal of Education and Management Engineering (IJEME), vol.10, no.4, pp.1-7, 2020. DOI: 10.5815/ijeme.2020.04.01
[32]S. Allwright, “What is a good MAPE score?”, 15 Aug 2022, Available at: https://stephenallwright.com/good-mape-score/
[33]D. Chicco, M.J. Warrens, G. Jurman, “The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation,” Peer J Computer Science 7:e623, 2021, https://doi.org/10.7717/peerj-cs.623
[34]R. Agrawal, Know the Best Evaluation Metrics for Your Regression Model. Available at: https://www.analyticsvidhya.com/blog/2021/05/know-the-best-evaluation-metrics-for-your-regression-model/