Optimal Playing Position Prediction in Football Matches: A Machine Learning Approach

Full Text (PDF, 1672KB), PP.30-47

Views: 0 Downloads: 0

Author(s)

Kevin Sander Utomo 1 Trianggoro Wiradinata 1,*

1. School of Information Technology, Universitas Ciputra Surabaya, Surabaya 60219, Indonesia

* Corresponding author.

DOI: https://doi.org/10.5815/ijieeb.2023.06.03

Received: 24 Mar. 2023 / Revised: 18 Apr. 2023 / Accepted: 7 May 2023 / Published: 8 Dec. 2023

Index Terms

Talent Identification, Machine Learning, Classification, Web Scrapping, Team Formation, Football

Abstract

Deciding optimal playing position can sometimes a challenging task for anyone working in sport management industry, particularly football. This study will present a solution by implementing Machine Learning approach to find and help football managers determine and predict where to place individual existing football players/potential players into different positions such as Attacking Midfielder (AM), Defending Midfielder (DMC), All-Around Midfielder (M), Defender (D), Forward Winger (FW), and Goalkeeper (GK) in a specific team formation based on their attributes. To aid in this identification process, it may be beneficial to understand how a player’s playstyle can affect where a player will be positioned in a team formation. The attributes used in facilitating the identification of the player position will be based on Passing Capabilities (AveragePasses), Offensive Capabilities (Possession, etc), Defensive Capabilities (Blocks, Through Balls, Tackles, etc), and Summary (Playtime, Goals, Assists, Passing Percentage, etc). The data that will be analysed upon will be scrapped manually from a popular football site that present football players statistics in a structured and ordered manner using a scrapping tool called Octoparse 8.0. Afterwards, the data that has been processed will be used to create a machine learning predictor modelled using various classification algorithms, which are KNN, Naive Bayes, Support Vector Machine, Decision Tree, and Random Forest ,coded using the Python programming language with the help of various machine learning and data science libraries, further enriched with copious graphs and charts which provides insight regarding the task at hand. The result of this study outputted in the form of the model predictor’s evaluation metric proves the Decision Tree algorithm have both the highest accuracy and f1-score of 76% and 75% respectively, while Naïve Bayes sits the lowest at both 69% accuracy and f1-score. The evaluation has prioritized validating and filtering algorithms that have overfitting in copious amounts which are evident in both the KNN and Support Vector Machine algorithms. As a result, the model formed in this study can be used as a tool for prediction in facilitating and aiding football managers, team coaches, and individual football players in recognizing the performance of a player relative to their position, which in turn would help teams in acquiring a specific type of player to fill a systematic frailty in their existing team roster.

Cite This Paper

Kevin Sander Utomo, Trianggoro Wiradinata, "Optimal Playing Position Prediction in Football Matches: A Machine Learning Approach", International Journal of Information Engineering and Electronic Business(IJIEEB), Vol.15, No.6, pp. 30-47, 2023. DOI:10.5815/ijieeb.2023.06.03

Reference

[1]N. Razali, A. Mustapha, F. A. Yatim, and R. Ab Aziz, “Predicting Player Position for Talent Identification in Association Football,” in IOP Conference Series: Materials Science and Engineering, Aug. 2017, vol. 226, no. 1. doi: 10.1088/1757-899X/226/1/012087.
[2]C. T. Woods, J. Veale, J. Fransen, S. Robertson, and N. F. Collier, “Classification of playing position in elite junior Australian football using technical skill indicators,” J Sports Sci, vol. 36, no. 1, pp. 97–103, Jan. 2018, doi: 10.1080/02640414.2017.1282621.
[3]D. Abidin, “A case study on player selection and team formation in football with machine learning,” Turkish Journal of Electrical Engineering and Computer Sciences, vol. 29, no. 3, pp. 1672–1691, 2021, doi: 10.3906/elk-2005-27.
[4]J. Pion, V. Segers, J. Stautemas, J. Boone, M. Lenoir, and J. G. Bourgois, “Position-specific performance profiles, using predictive classification models in senior basketball,” Int J Sports Sci Coach, vol. 13, no. 6, pp. 1072–1080, Dec. 2018, doi: 10.1177/1747954118765054.
[5]D. R. Anamisa et al., “A selection system for the position ideal of football players based on the AHP and TOPSIS methods,” IOP Conf Ser Mater Sci Eng, vol. 1125, no. 1, p. 012044, May 2021, doi: 10.1088/1757-899x/1125/1/012044.
[6]M. Bazmara, “A Novel Fuzzy Approach for Determining Best Position of Soccer Players,” International Journal of Intelligent Systems and Applications, vol. 6, no. 9, pp. 62–67, Aug. 2014, doi: 10.5815/ijisa.2014.09.08.
[7]RVS Technical Campus, IEEE Electron Devices Society, and Institute of Electrical and Electronics Engineers, Proceedings of the Second International Conference on Electronics, Communication and Aerospace Technology (ICECA 2018) : 29-31, May 2018.
[8]A. Makolo and T. Adeboye, “Credit Card Fraud Detection System Using Machine Learning,” International Journal of Information Technology and Computer Science, vol. 13, no. 4, pp. 24–37, Aug. 2021, doi: 10.5815/IJITCS.2021.04.03.
[9]Y. F. Alfredo and S. M. Isa, “Football Match Prediction with Tree Based Model Classification,” International Journal of Intelligent Systems and Applications, vol. 11, no. 7, pp. 20–28, Jul. 2019, doi: 10.5815/ijisa.2019.07.03.
[10]T. Wiradinata, “Folding Bicycle Prospective Buyer Prediction Model,” International Journal of Information Engineering and Electronic Business, vol. 13, no. 5, pp. 1–8, Oct. 2021, doi: 10.5815/ijieeb.2021.05.01.
[11]N. Sharma, S. Appukutti, U. Garg, J. Mukherjee, and S. Mishra, “Analysis of Student’s Academic Performance based on their Time Spent on Extra-Curricular Activities using Machine Learning Techniques,” International Journal of Modern Education and Computer Science, vol. 15, no. 1, pp. 46–57, Feb. 2023, doi: 10.5815/ijmecs.2023.01.04.
[12]P. Patil, “What is exploratory data analysis?,” Medium, 30-May-2022. [Online]. Available: https://towardsdatascience.com/exploratory-data-analysis-8fc1cb20fd15 [Accessed: 19 Jan 2023].
[13]S. R. Sahoo and B. B. Gupta, “Classification of various attacks and their defence mechanism in online social networks: a survey,” Enterprise Information Systems, vol. 13, no. 6. Taylor and Francis Ltd., pp. 832–864, Jul. 03, 2019. doi: 10.1080/17517575.2019.1605542.
[14]G. N. Nguyen, N. H. le Viet, M. Elhoseny, K. Shankar, B. B. Gupta, and A. A. A. El-Latif, “Secure blockchain enabled Cyber–physical systems in healthcare using deep belief network with ResNet model,” J Parallel Distrib Comput, vol. 153, pp. 150–160, Jul. 2021, doi: 10.1016/j.jpdc.2021.03.011.
[15]A. Mishra, B. B. Gupta, D. Peraković, F. José, and G. Peñalvo, “A Survey on Data mining classification approaches,” 2021. [Online]. Available: http://ceur-ws.org
[16]S. Christian, “The importance of data preprocessing for machine learning in the e-commerce industry,” School of Information Systems, 11-Jul-2022. [Online]. Available: https://sis.binus.ac.id/2022/07/11/the-importance-of-data-preprocessing-for-machine-learning-in-the-e-commerce-industry/.
[17]R. Mohammed, J. Rawashdeh, and M. Abdullah, “Machine Learning with Oversampling and Undersampling Techniques: Overview Study and Experimental Results,” in 2020 11th International Conference on Information and Communication Systems, ICICS 2020, Apr. 2020, pp. 243–248. doi: 10.1109/ICICS49469.2020.239556.
[18]D. U. Ozsahin, M. Taiwo Mustapha, A. S. Mubarak, Z. Said Ameen and B. Uzun, "Impact of feature scaling on machine learning models for the diagnosis of diabetes," 2022 International Conference on Artificial Intelligence in Everything (AIE), Lefkosa, Cyprus, 2022, pp. 87-94. doi:10.1109/AIE57029.2022.00024.
[19]M. Butwall, “Data Normalization and standardization: Impacting classification model accuracy,” International Journal of Computer Applications, vol. 183, no. 35, pp. 6–9, 2021.
[20]A. Martulandi, “K-nearest neighbors in python + hyperparameters tuning,” Medium, 24-Oct-2019. [Online]. Available: https://medium.datadriveninvestor.com/k-nearest-neighbors-in-python-hyperparameters-tuning-716734bc557f. [Accessed: 19-Jan-2023].
[21]A. Sharma, “Naive bayes: Gaussian naive Bayes with Hyperpameter tuning in Python,” Analytics Vidhya, 27-Jan-2021. [Online]. Available: https://www.analyticsvidhya.com/blog/2021/01/gaussian-naive-bayes-with-hyperpameter-tuning/. [Accessed: 19-Jan-2023].
[22]M. B. Fraj, “In depth: Parameter tuning for SVC,” Medium, 05-Jan-2018. [Online]. Available: https://medium.com/all-things-ai/in-depth-parameter-tuning-for-svc-758215394769. [Accessed: 19-Jan-2023].
[23]M. Mithrakumar, “How to tune a decision tree?,” Medium, 12-Nov-2019. [Online]. Available: https://towardsdatascience.com/how-to-tune-a-decision-tree-f03721801680. [Accessed: 19-Jan-2023].
[24]W. Koehrsen, “Hyperparameter tuning the random forest in python,” Medium, 10-Jan-2018. [Online]. Available: https://towardsdatascience.com/hyperparameter-tuning-the-random-forest-in-python-using-scikit-learn-28d2aa77dd74. [Accessed: 19-Jan-2023].
[25]M. K. Uçar, M. Nour, H. Sindi, and K. Polat, “The Effect of Training and Testing Process on Machine Learning in Biomedical Datasets,” Math Probl Eng, vol. 2020, 2020, doi: 10.1155/2020/2836236.
[26]Bharathi, “Confusion matrix for multi-class classification,” Analytics Vidhya, 30-Nov-2022. [Online]. Available: https://www.analyticsvidhya.com/blog/2021/06/confusion-matrix-for-multi-class-classification/. [Accessed: 08-Jan-2023].
[27]S. Gupta, K. Saluja, A. Goyal, A. Vajpayee, and V. Tiwari, “Comparing the performance of machine learning algorithms using estimated accuracy,” Measurement: Sensors, vol. 24, Dec. 2022, doi: 10.1016/j.measen.2022.100432.
[28]A. Tasnim, Md. Saiduzzaman, M. A. Rahman, J. Akhter, and A. S. Md. M. Rahaman, “Performance Evaluation of Multiple Classifiers for Predicting Fake News,” Journal of Computer and Communications, vol. 10, no. 09, pp. 1–21, 2022, doi: 10.4236/jcc.2022.109001.
[29]D. Berrar, “Cross-validation,” in Encyclopedia of Bioinformatics and Computational Biology: ABC of Bioinformatics, vol. 1–3, Elsevier, 2018, pp. 542–545. doi: 10.1016/B978-0-12-809633-8.20349-X.
[30]M. de Rooij and W. Weeda, “Cross-Validation: A Method Every Psychologist Should Know,” Adv Methods Pract Psychol Sci, vol. 3, no. 2, pp. 248–263, Jun. 2020, doi: 10.1177/2515245919898466.