Comparative Analysis of Stemming Algorithms for Web Text Mining

Full Text (PDF, 476KB), PP.20-25

Views: 0 Downloads: 0

Author(s)

Muhammad Haroonb 1,*

1. Department of Computing & Information Technology University of Gujrat Lahore Sub Campus, Lahore, Pakistan

* Corresponding author.

DOI: https://doi.org/10.5815/ijmecs.2018.09.03

Received: 18 Jul. 2018 / Revised: 29 Jul. 2018 / Accepted: 7 Aug. 2018 / Published: 8 Sep. 2018

Index Terms

Stemming Algorithms, Stemmers, Information Retrieval, NLP, Morphology, Web Mining

Abstract

As the massive data is increasing exponentially on web and information retrieval systems and the data retrieval has now become challenging. Stemming is used to produce meaningful terms by stemming characters which finally result in accurate and most relevant results. The core purpose of stemming algorithm is to get useful terms and to reduce grammatical forms in morphological structure of some language. This paper describes the different types of stemming algorithms which work differently in different types of corpus and explains the comparative study of stemming algorithms on the basis of stem production, efficiency and effectiveness in information retrieval systems.

Cite This Paper

Muhammad Haroon, " Comparative Analysis of Stemming Algorithms for Web Text Mining", International Journal of Modern Education and Computer Science(IJMECS), Vol.10, No.9, pp. 20-25, 2018. DOI:10.5815/ijmecs.2018.09.03

Reference

[1]Deepika Sharma, Stemming Algorithms: A Comparative Study and their Analysis, International Journal of Applied Information Systems (IJAIS) Foundation of Computer Science, FCS, New York, USA September 2012 ISSN : 2249-0868 Volume 4– No.3
[2]Narayan L. Bhamidipati and Sankar K. Pal, Stemming via distribution based word segregation for classification and retrival,IEEE Transaction on system,man,and cybernetics. Vol 37, No.2 April 2007.
[3]J. B. Lovins, “Development of a stemming algorithm,” Mechanical Translation and Computer Linguistic., Vol.11, No.1/2, pp. 22-31, 1968.
[4]M. F. Porter 1980. "An Algorithm for Suffix Stripping Program", 14(3), 130-37.
[5]Sandeep R. Sirsat, Dr. Vinay Chavan and Dr. Hemant S. Mahalle, Strength and Accuracy Analysis of Affix Removal Stemming Algorithms, International Journal of Computer Science and Information Technologies, Vol. 4 (2) , 2013, 265 - 269.
[6]Porter M.F. “Snowball: A language for stemming algorithms”. 2001.
[7]Paice Chris D. “Another stemmer”. ACM SIGIR Forum, Volume 24, No. 3. 1990, 56-61.
[8]Frakes, W. "Stemming Algorithms." In Frakes, W. and R. Baeza-Yates, Information Retrieval: Data Structures and Algorithms. Englewood Cliffs, NJ: Prentice-Hall, 1992.
[9]Eiman Tamah Al-Shammari, “Towards An Error-Free Stemming”, in Proceedings of ADIS European Conference Data Mining 2008, pp. 160-163.
[10]Melucci, M. & Orio, N. (2003). “A novel method for stemmer generation based on hidden Markov models. In Proceedings of the Twelfth International Conference on Information and Knowledge Management”, (pp. 131-138). New York, NY: ACM Press.
[11]J. Xu and W. B. Croft 1998. “Corpus-based stemming using co-occurrence of word variants”. ACM Trans. Inf. Syst. 16, 1, 61–81.
[12]D.W. Oard, G.A. Levow and C.I. Cabezas 2001. CLEF experiments at Maryland: “Statistical stemming and back off translation”. In Revised Papers from the Workshop of Cross-Language Evaluation Forum on Cross-Language Information Retrieval and Evaluation (CLEF), Springer, London, 176–187.
[13]M. Bacchin, N. Ferro, and M. Melucci 2005. “A probabilistic model for stemmer generation”. Inf. Process. Manage. 41, 1, 121–137.
[14]WB Frakes, 1992,“Stemming Algorithm “, in “Information Retrieval Data Structures and Algorithm”,Chapter 8, page 132-139.
[15]JH Paik, Mandar Mitra, Swapan K. Parui, Kalervo Jarvelin, “GRAS ,An effective and efficient stemming algorithm for information retrieval”, ACM Transaction on Information System Volume 29 Issue 4, December 2011, Chapter 19, page 20-24
[16]Mayfield, J. & McNamee, P. (2003). “Single n-gram stemming. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval”, (pp. 415-416). New York, NY: ACM Press.
[17]P. McNamee, and J. Mayfield 2004. “Character n-gram tokenization for European language text retrieval”, Inf. Retr. 7(1-2), 73–97.
[18]Prasenjit Majumder, Mandar Mitra, Swapan K. Parui, Gobinda Kole, Pabitra Mitra and Kalyankumar Datta. “YASS: Yet another suffix stripper”. ACM Transactions on Information Systems. Volume 2 , Issue 4. 2007, Article No. 18.
[19]A. K. Jain, M.N. Murthy, and P. J. Flynn 1999. “Data clustering”: A review. ACM Comput. Surv. 31, 3, 264–323.
[20]Majumder, P., Mitra, M., Parui, S. K., Kole, G., Mitra, P. & Datta, K. (2007). “YASS: yet another suffix stripper. ACM Transactions on Information Systems”, 25(4), paper 18.