Optimal Machine learning Model for Software Defect Prediction

Full Text (PDF, 1001KB), PP.36-48

Views: 0 Downloads: 0

Author(s)

Tripti Lamba 1,* Kavita 2 A.K.Mishra 2

1. JaganNathUniversity, Jaipur, India

2. JaganNathUniversity, Jaipur , India and Principal Scientist AKMU, IARI Pusa Campus, New Delhi, India

* Corresponding author.

DOI: https://doi.org/10.5815/ijisa.2019.02.05

Received: 6 Nov. 2017 / Revised: 18 Apr. 2018 / Accepted: 12 Jul. 2018 / Published: 8 Feb. 2019

Index Terms

Linear Regression, Random Forest, Neural Network, Support Vector Machine, Decision Tree, Decision Stump

Abstract

Machine Learning is a division of Artificial Intelligence which builds a system that learns from the data. Machine learning has the capability of taking the raw data from the repository which can do the computation and can predict the software bug. It is always desirable to detect the software bug at the earliest so that time and cost can be reduced. Feature selection technique wrapper and filter method is used to find the most optimal software metrics. The main aim of the paper is to find the best model for the software bug prediction. In this paper machine learning techniques linear Regression, Random Forest, Neural Network, Support Vector Machine, Decision Tree, Decision Stump are used and comparative analysis has been done using performance parameters such as correlation, R-squared, mean square error, accuracy for software modules named as ant, ivy, tomcat, berek, camel, lucene, poi, synapse and velocity. Support vector machine outperform as compare to other machine learning model.

Cite This Paper

Tripti Lamba, Kavita, A.K.Mishra, "Optimal Machine learning Model for Software Defect Prediction", International Journal of Intelligent Systems and Applications(IJISA), Vol.11, No.2, pp.36-48, 2019. DOI:10.5815/ijisa.2019.02.05

Reference

[1]S. Puranik, P. Deshpande, and K. Chandrasekaran, “A Novel Machine Learning Approach for Bug Prediction,” Procedia - Procedia Comput. Sci., vol. 93, no.September, pp. 924–930, 2016. “doi:10.1016/j.procs.2016.07.271”.
[2]K. O. Elish and M. O. Elish, “Predicting defect-prone software modules using support vector machines,” Journal of Systems and Software, vol. 81, no. 5, pp. 649–660, 2008. “doi:10.1016/j.jss.2007.07.040”
[3]M. Dhiauddin, M. Suffian, and S. Ibrahim, “A Prediction Model for System Testing Defects using Regression Analysis,” Int. J. Soft Comput. Softw. Eng., vol. 2, no. 7, pp. 55–68, 2012. “doi:10.7321/jscse.v2.n7.6”
[4]P. S. Rana, H. Sharma, M. Bahattacharya, and A. Shukla, “Journal of Bioinformatics and Computational Biology c Imperial College Press Quality assessment of modeled protein structure using physicochemical properties,” J. Bioinform. Comput. Biol., vol. 13, no. 2, pp. 1–16, 2015. https://doi.org/10.1142/S0219720015500055
[5]Y. Suresh, J. Pati, and S. K. Rath, “Effectiveness of software metrics for object-oriented system,” Procedia Technology vol. 6, pp. 420–427, 2012.” doi: 10.1016/j.protcy.2012.10.050”
[6]K. Gao, T. M. Khoshgoftaar, H. Wang, and N. Seliya, “Choosing software metrics for defect prediction : an investigation on feature selection techniques,” Software: Practice and Experience pp. 579–606, 2011.” doi: 10.1002/spe.1043”
[7]A. Liaw and M. Wiener, “Classification and Regression by randomForest,” R news, vol. 2, no. December, pp. 18–22, 2002. “doi: 10.1177/154405910408300516” .
[8]M. B. Kursa and W. R. Rudnicki, “Feature Selection with the Boruta Package,” J. Stat. Softw., vol. 36, no. 11, pp. 1–13, 2010.”doi: Vol. 36, Issue 11, Sep 2010 “
[9]Cran.r-project.org, 2018. [Online]. Available: https://cran.r-project.org/web/packages/leaps/leaps.pdf. [Accessed: 21- Feb- 2018].
[10]Y. Suresh, J. Pati, and S. K. Rath, “Effectiveness of software metrics for object-oriented system,” Procedia technology vol. 6, pp. 420–427, 2012.”doi: 10.1016/j.protcy.2012.10.050”
[11]A. Liaw and M. Wiener, “Classification and Regression by randomForest,” R news, vol. 2, no. December, pp. 18–22, 2002.”doi: 10.1177/154405910408300516”.
[12]Y. Song and Y. Lu, “Decision tree methods: applications for classification and prediction,” Biostatistics in psychiatry, vol. 27. pp. 130–135, 2015.”doi: 10.11919/j.issn.1002-0829.215044”
[13]P. Romanski, L. Kotthoff, and M. L. Kotthoff, “Package FSelector: Selecting Attributes,” Cran, p. 18, 2016.”
[14]P. A. Selvaraj and P. Thangaraj, “Support Vector Machine for Software Defect Prediction,” Int. J. Eng. Technol. Res., vol. 1, no. 2, pp. 68–76, 2013.
[15]E. Osuna, R. Freund, and F. Girosit, “Training support vector machines: an application to face detection,” Proceedings of IEEE Computer Society Conference on Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition. pp. 130–136, 1997. “doi:10.1109/cvpr.1997.609310”
[16]F. E. H. Tay and C. Lijuan, “Application of Support Vector Machines in Financial Time Series Forecasting,” Omega, vol. 29, no. 2001, pp. 309–317, 2001. “doi: 10.1016/s0305-0483(01)00026-3”
[17]Y. Song and Y. Lu, “Decision tree methods: applications for classification and prediction,” Biostatistics in psychiatry, vol. 27. pp. 130–135, 2015.” doi:10.11919/j.issn.1002-0829.215044”
[18]F. E. H. Tay and C. Lijuan, “Application of Support Vector Machines in Financial Time Series Forecasting,” Omega, vol. 29, no. 2001, pp. 309–317, 2001.
[19]E. Rahm and H. H. Do, “Data cleaning: Problems and current approaches,” IEEE Bull. Data Eng., vol. 23, no. 4, pp, 2000.
[20]“Documentation,” Machine Learning in MATLAB - MATLAB & Simulink - MathWorks India. [Online]. Available: https://in.mathworks.com/help/stats/machine-learning-in- matlab.html. [Accessed: 03-Jul-2017].
[21]“1.10. Decision Trees¶,” 1.10. Decision Trees — scikit-learn 0.18.2 documentation. [Online]. Available :http :// scikit-learn.org/stable/ module s/tree.html. [Accessed: 04-Jul-2017].
[22]S. Kim, “Introduction to Machine Learning for developers,” Algorithmia Blog, 28-Feb-2017. [Online].Available: https:// blog. algorithmia. com/ introduction -machine-learning- developers/. [Accessed: 05-Jul-2017].
[23]“Random forest,” Wikipedia, 04-Jul-2017. [Online]. Available:https://en. wikipedia.org /wiki/ Random_ forest. [Accessed: 05-Jul-2017].
[24]S. Ray, S. Bansal, A. Gupta, D. Gupta, and F. Shaikh, “Understanding Support Vector Machine algorithm from examples (along with code),” Analytics Vidhya, 13-Sep-2016. [Online]. Available: https://www. analyticsvidhya.com/ blog/2015/10/understaing-support-vector-machine-example-code/. [Accessed: 05-Jul-2017].
[25]E. Rahm and H. Do, “Data cleaning: Problems and current approaches,” Bull. Tech. Comm., 2000.
[26]R. K. H. Galvão and A. M.C.U., “Variable Selection,” Compr. Chemom., pp. 233–283, 2009.
[27]N. Pandey, D. K. Sanyal, A. Hudait, and A. Sen, “Automated classification of software issue reports using machine learning techniques: an empirical study,” Innov. Syst. Softw. Eng., pp. 1- 19, 2017. “doi: 10.1007/s11334-017-0294-1”
[28]P. Deep Singh and A. Chug, “Software defect prediction analysis using machine learning algorithms,” 2017 7th Int. Conf. Cloud Comput. Data Sci. Eng. - Conflu., pp. 775–781, 2017. ” doi: 10.1109/ CONFLUENCE .2017. 7943255”.
[29]M. Singh and D.S. Salaria. “Software defect prediction tool based on neural network”. International Journal of Computer Applications. Vol. 70 No. 22. pp- 22-28, 2013.”doi: 10.5120/12200-8368”.
[30]A. Okutan and O. T. Yildiz, “Software defect prediction using Bayesian networks,” Empir. Softw. Eng., vol. 19, no. 1, pp. 154–181, 2014.”doi: 10.1007/s10664-012-9218-8”
[31]A. Kaur, K. Kaur, and D. Chopra, “An empirical study of software entropy based bug prediction using machine learning,” Int. J. Syst. Assur. Eng. Manag., 2016. “doi: 10.1007/s10664-012-9218-8”.
[32]X. Rong, F. Li and Z. Cui. “A model for software defect prediction using support vector machine based on CBA”. Int. J. Intelligent Systems Technologies and Applications, Vol. 15, No. 1, pp- 19-34. 2016. “doi: 10.1504/ijista.2016.076102”.