A Stochastic Prediction Interface for Urdu

Full Text (PDF, 425KB), PP.94-100

Views: 0 Downloads: 0

Author(s)

Qaiser Abbas 1,*

1. Fachbereich Sprachwissenschaft, Universität Konstanz, Konstanz, 78457, Germany

* Corresponding author.

DOI: https://doi.org/10.5815/ijisa.2015.01.09

Received: 11 Apr. 2014 / Revised: 4 Aug. 2014 / Accepted: 5 Oct. 2014 / Published: 8 Dec. 2014

Index Terms

Urdu Prediction Interface, N-Gram Language Model, QASKU, Word and Sequence Prediction, Corpus Based Application

Abstract

This work lays down a foundation for text prediction of an inflected and under-resourced language Urdu. The interface developed is not limited to a T9 (Text on 9 keys) application used in embedded devices, which can only predict a word after typing initial characters. It is capable of predicting a word like T9 and also a sequence of word after a word in a continuous manner for fast document typing. It is based on N-gram language model. This stochastic interface deals with three N-gram levels from unary to ternary independently. The uni-gram mode is being in use for applications like T9, while the bi-gram and tri-gram modes are being in use for sentence prediction. The measures include a percentage of keystrokes saved, keystrokes until completion and a percentage of time saved during the typing. Two different corpora are merged to build a sufficient amount of data. The test data is divided into a test and a held out data equally for an experimental purpose. This whole exercise enables the QASKU system outperforms the FastType with almost 15% more saved keystrokes.

Cite This Paper

Qaiser Abbas, "A Stochastic Prediction Interface for Urdu", International Journal of Intelligent Systems and Applications(IJISA), vol.7, no.1, pp.94-100, 2015. DOI:10.5815/ijisa.2015.01.09

Reference

[1]Q. Abbas, “Building a Hierarchical Annotated Corpus of Urdu: The URDU.KON-TB Treebank”, Lecture Notes in Computer Science (LNCS), Vol. 7181(1), P 66-79, ISSN 0302-9743, Springer-Verlag Berlin/Heidelberg, 2012.

[2]Q. Abbas and A. N. Khan, “Lexical functional grammar for Urdu modal verbs” In Proceedings of 5th IEEE International Conference on Engineering and Technology (ICET), 2009.

[3]Q. Abbas, N. Karamat and S. Niazi, “Development Of Tree-Bank Based Probabilistic Grammar For Urdu Language” International Journal of Electrical & Computer Science, Vol. 9(09), pp. 231–235, 2009.

[4]C. Aliprandi, N. Carmignani, and P. Mancarella, “An Inflected-Sensitive Letter And Word Prediction System”, International Journal of Computing & Information Sciences, Vol. 5(2), pp. 79-85, 2007.

[5]C. Aliprandi, N. Carmignani, N. Deha, P. Mancarella, and M. Rubino, “Advances In NLP Applied To Word Prediction”, Langtech, 2008.

[6]C. Aliprandi, N. Carmignani, P. Mancarella, and M. Rubino, “A Word Predictor For Inflected Languages: System Design And User-Centric Interface”, In Proceedings of the 2nd IASTED International Conference on Human-Computer Interaction, March 2007.

[7]T. Bögel, M. Butt, and S. Sulger, “Urdu Ezafe And The Morphology-Syntax Interface”, In proceedings of LFG08, 2008.

[8]M. Butt, and T. Ahmed, “The Redevelopment Of Indo-Aryan Case Systems From A Lexical Semantic Perspective”, Morphology, Vol. 21(3-4), pp. 545-572, 2011.

[9]M. Butt, and T. H. King, “Restriction For Morphological Valency Alternations: The Urdu Causative” Intelligent linguistic architectures: Variations on themes by Ronald M. Kaplan, pp. 235-258, 2006.

[10]M. Butt, and G. Ramchand, “Complex Aspectual Structure In Hindi/Urdu” M. Liakata, B. Jensen, & D. Maillat, Eds, pp. 1-30, 2001.

[11]N. Cesa-Bianchi, Y. Freund, D. Haussler, D. P. Helmbold, R. E. Schapire, and M. K. Warmuth, “How To Use Expert Advice”, Journal of the ACM (JACM), Vol. 44(3), pp. 427-485, 1997.

[12]A. DeSantis, G. Markowsky, and M. N. Wegman, “Learning Probabilistic Prediction Functions”, In 29th Annual Symposium on IEEE, Foundations of Computer Science, pp. 110-119, October 1988.

[13]A. Fazly, and G. Hirst, “Testing The Efficacy Of Part-Of-Speech Information In Word Completion”, In Proceedings of the 2003 EACL Workshop on Language Modeling for Text Entry Methods, Association for Computational Linguistics, pp. 9-16, April 2003.

[14]M. Ijaz, “Urdu 5000 Most Frequent Words”, Technical report, Center for Research and Urdu Language Processing, National University of Computer & Emerging Sciences, Lahore, PK, 2007.

[15]C. D. Manning, and H. Schütze, Foundations Of Statistical Natural Language Processing, Cambridge: MIT press, 1999. 

[16]F. C. Pereira, Y. Singer, and N. Tishby, “Beyond Word N-Grams”, In Natural Language Processing Using Very Large Corpora, pp. 121-136, Springer Netherlands, 1999.

[17]G. Raza, Sub-categorization Acquisition And Classes Of Predication In Urdu, PhD Thesis, 2011.

[18]D. D. Sleator, and R. E. Tarjan, “Self-Adjusting Binary Search Trees”, Journal of the ACM (JACM), Vol. 32(3), pp. 652-686, 1985.

[19]F. M. Willems, “The Context-Tree Weighting Method: Extensions”, IEEE Transactions on Information Theory, Vol. 44(2), pp. 792-798, 1998.

[20]I. J. Good, “The Population Frequencies Of Species And The Estimation Of Population Parameters”, Biometrika, Vol. 40(3-4), pp. 237-264, 1953.

[21]Jaber Karimpour,Ali A. Noroozi,Adeleh Abadi, "The Impact of Feature Selection on Web Spam Detection", IJISA, vol.4, no.9, pp.61-67, 2012.

[22]Souleymane KOUSSOUBE, Roger NOUSSI, Balira O. KONFE, "Using Description Logics to specify a Document Synthesis System", IJISA, vol.5, no.3, pp.13-22, 2013.DOI: 10.5815/ijisa.2013.03.02

[23]Leandro Luiz de Almeida, Maria Stela V. de Paiva, Francisco Assis da Silva, Almir Olivette Artero, "Super-resolution Image Created from a Sequence of Images with Application of Character Recognition", IJISA, vol.6, no.1, pp.11-19, 2014. DOI: 10.5815/ijisa.2014.01.02

[24]Q. Abbas, G. Raza, “A computational classification of Urdu dynamic copula verb”, International Journal of Computer Applications (IJCA), Vol. 85 (10), pp. 1-12, ISSN: 0975 - 8887, Published by Foundation of Computer Science, New York, USA, 2014.

[25]Q. Abbas, “Semi-Semantic Part of Speech Annotation and Evaluation”, In Proceedings of ACL 8th Linguistic Annotation Workshop held in conjunction with COLING, Association of Computational Linguistics, P 75-81, Ireland, 2014.

[26]Q. Abbas, M. S. Ahmed, S. Niazi, “Language Identifier for Languages of Pakistan Including Arabic and Persian”, International Journal of Computational Linguistics (IJCL), Vol. 01(03), P 27-35, ISSN 2180-1266, 2010.