An Exploratory Approach to Find a Novel Metric Based Optimum Language Model for Automatic Bangla Word Prediction

Full Text (PDF, 588KB), PP.47-54

Views: 0 Downloads: 0

Author(s)

Md. Tarek Habib 1,* Abdullah Al-Mamun 1 Md. Sadekur Rahman 1 Shah Md. Tanvir Siddiquee 1 Farruk Ahmed 2

1. Department of CSE, Daffodil International University, Dhaka, Bangladesh

2. Department of CSE, Independent University Bangladesh, Dhaka, Bangladesh

* Corresponding author.

DOI: https://doi.org/10.5815/ijisa.2018.02.05

Received: 16 Apr. 2017 / Revised: 17 Jul. 2017 / Accepted: 15 Sep. 2017 / Published: 8 Feb. 2018

Index Terms

Word prediction, performance metric, natural language processing, N-gram, language model, corpus, machine learning, eager learning

Abstract

Word completion and word prediction are two important phenomena in typing that have intense effect on aiding disable people and students while using keyboard or other similar devices. Such auto completion technique also helps students significantly during learning process through constructing proper keywords during web searching. A lot of works are conducted for English language, but for Bangla, it is still very inadequate as well as the metrics used for performance computation is not rigorous yet. Bangla is one of the mostly spoken languages (3.05% of world population) and ranked as seventh among all the languages in the world. In this paper, word prediction on Bangla sentence by using stochastic, i.e. N-gram based language models are proposed for auto completing a sentence by predicting a set of words rather than a single word, which was done in previous work. A novel approach is proposed in order to find the optimum language model based on performance metric. In addition, for finding out better performance, a large Bangla corpus of different word types is used.

Cite This Paper

Md. Tarek Habib, Abdullah Al-Mamun, Md. Sadekur Rahman, Shah Md. Tanvir Siddiquee, Farruk Ahmed, "An Exploratory Approach to Find a Novel Metric Based Optimum Language Model for Automatic Bangla Word Prediction", International Journal of Intelligent Systems and Applications(IJISA), Vol.10, No.2, pp.47-54, 2018. DOI:10.5815/ijisa.2018.02.05

Reference

[1]I. S. I. Abuhaiba and H. M. Dawoud, "Combining Different Approaches to Improve Arabic Text Documents Classification", International Journal of Intelligent Systems and Applications(IJISA), Vol.9, No.4, pp.39-52, 2017.
[2]G. Chandra, S. K. Dwivedi, "Assessing Query Translation Quality Using Back Translation in Hindi-English CLIR," International Journal of Intelligent Systems and Applications (IJISA), Vol.9, No.3, pp.51-59, 2017.
[3]K. K. Ravi and P.V. Subbaiah, "A Survey on Speech Enhancement Methodologies," International Journal of Intelligent Systems and Applications (IJISA), Vol.8, No.12, pp.37-45, 2016.
[4]M. M. Haque, M. T Habib and M. M. Rahman, “Automated Word Prediction in Bangla Language Using Stochastic Language Models,” Academy & Industry Research Collaboration Center (AIRCC) International Journal in Foundations of Computer Science & Technology, vol. 5, no. 6, pp. 67–75, November 2015.
[5]S. Bickel, P. Haider and T. Scheffer, (2005), "Predicting Sentences using N-Gram Language Models," In Proceedings of Conference on Empirical Methods in Natural language Processing.
[6]N. Garay-Vitoria and J. Gonzalez-Abascal, (2005), "Application of Artificial Intelligence Methods in a Word-Prediction Aid," Laboratory of Human-Computer Interaction for Special Needs.
[7]H. Al-Mubaid, "A Learning-Classification Based Approach for Word Prediction," The International Arab Journal of Information Technology, Vol. 4, No. 3, 2007.
[8]D. Nagalaviand and M. Hanumanthappa, “N-gram Word prediction language models to identify the sequence of article blocks in English e-newspapers,” In Proceedings of International Conference on Computation System and Information Technology for Sustainable Solutions (CSITSS), 2016.
[9]Q. Abbas, (2014), "A Stochastic Prediction Interface for Urdu", Intelligent Systems and Applications, Vol.7, No.1, pp 94-100 .
[10]U. P. Singh, V. Goyal and A. Rani, (2014), "Disambiguating Hindi Words Using N-Gram Smoothing Models", International Journal of Engineering Sciences, Vol.10, Issue June, pp 26-29.
[11]J. Alam, N. Uzzaman and M. khan, (2006), "N-gram based Statistical Grammar Checker for Bangla and English", In Proceedings of International Conference on Computer and Information Technology.
[12]N. H. Khan, G. C. Saha, B. Sarker and M. H. Rahman, (2014), "Checking the Correctness of Bangla Words using N-Gram", International Journal of Computer Application, Vol. 89, No. 11.
[13]N. H. Khan, M. F. Khan, M. M. Islam, M. H. Rahman and B. Sarker, "Verification of Bangla Sentence Structure using N-Gram," Global Journal of Computer Science and Technology, vol. 14, issue 1, 2014.
[14]M. R. Rahman, M. T. Habib, M. S. Rahman, S. B. Shuvo and M. S. Uddin, “An Investigative Design Based Statistical Approach for Determining Bangla Sentence Validity,” International Journal of Computer Science and Network Security, vol. 16, no. 11, pp. 30–37, November 2016.
[15]Q. Qiu et al., "Confabulation based sentence completion for machine reading," 2011 IEEE Symposium on Computational Intelligence, Cognitive Algorithms, Mind, and Brain (CCMB), Paris, 2011, pp. 1-8.
[16]G. Zweig, C. J. C. Burges. (2011). Tech report: “The Microsoft Research Sentence Completion Challenge”.
[17]K. Grabski and T. Scheffer. Sentence completion. In Proc. SIGIR, pages 433–439, Sheffield, United Kingdom, 2004.
[18]S. Bickel, P. Haider, and T. Scheffer. Learning to com-plete sentences. In Proceedings. ECML, volume 3720 of Lecture Notes in Computer Science, pages 497{504. Springer, 2005}.
[19]S. Bhatia, D. Majumdar, and P. Mitra. Query suggestions in the absence of query logs. In Proceedings. SIGIR, pp. 795-804, Beijing, China, 2011.
[20]M. Shokouhi. 2013. Learning to personalize query auto-completion. In Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval (SIGIR '13). ACM, New York, NY, USA.
[21]P.-N. Tan, M. Steinbach, and V. Kumar, “Introduction to Data Mining,” Addison-Wesley, 2006.