Density Based Script Identification of a Multilingual Document Image

Full Text (PDF, 316KB), PP.8-14

Views: 0 Downloads: 0

Author(s)

Rumaan Bashir 1,* S.M.K. Quadri 2

1. Department of Computer Science, Islamic University of Science & Technology, Awantipora, Pulwama, J&K, 192122, India

2. P.G. Department of Computer Science, University of Kashmir, Hazratbal, Srinagar, 190006, India

* Corresponding author.

DOI: https://doi.org/10.5815/ijigsp.2015.02.02

Received: 17 Sep. 2014 / Revised: 5 Nov. 2014 / Accepted: 10 Dec. 2014 / Published: 8 Jan. 2015

Index Terms

Document Image Analysis, Multilingual Script Identification, Kashmiri, Roman, Devanagari, Urdu, Density, Statistical Approach

Abstract

Automatic Pattern Recognition field has witnessed enormous growth in the past few decades. Being an essential element of Pattern Recognition, Document Image Analysis is the procedure of analyzing a document image with the intention of working out the contents so that they can be manipulated as per the requirements at various levels. It involves various procedures like document classification, organizing, conversion, identification and many more. Since a document chiefly contains text, Script Identification has grown to be a very important area of this field. A Script comprises the text of a document or a manuscript. It is a scheme of written characters and symbols used to write a particular language. Languages are written using scripts, but script itself is made up of symbols. Every language has its own set of symbols used for writing it. Sometimes different languages are written using the same script, but with marginal modification. Script Identification has been performed for unilingual, bilingual and multilingual document images. But, negligible work has been reported for Kashmiri script. In this paper, we are analyzing and experimentally testing statistical approach for identification of Kashmiri script in a document image along with Roman, Devanagari & Urdu scripts. The identification is performed on offline machine-printed scripts and yields promising results.

Cite This Paper

Rumaan Bashir, S. M. K. Quadri,"Density Based Script Identification of a Multilingual Document Image", IJIGSP, vol.7, no.2, pp.8-14, 2015. DOI: 10.5815/ijigsp.2015.02.02

Reference

[1]S. S. Toshkhani, "Kashmiri Language: Roots, Evolution and Affinity". Kashmiri Overseas Association, Inc. (2008) 

[2]Rumaan Bashir and Smk Quadri, “Identifiction of Kashmiri Script in a Bilingual Document Image”, In Proc. 2nd IEEE ICIIP, JUIT Shimla, India, (2013). 

[3]Ahmed M Elgammal & Mohamed A. Ismail, “Techniques for Language Identification for Hybrid Arabic-English Document Images”, IEEE, (2001).

[4]Sukalpa Chanda, Umapada Pal, Katrin Franke & Fumitaka Kimura, “Script Identification- A Han and Roman Script Perspective”, IEEE, Intl. Conf. on Pattern Recognition, (2010).

[5]Shamita Ghosh and Bidyut B. Chaudhuri, “Composite Script Identification and Orientation Detection for Indian Text Images”, IEEE Intl. Conf. on Document Analysis and Recognition, (2011).

[6]Sukalpa Chanda, Umapada Pal & Katrin Franke, “Font Identification – In Context of Indic Script”, 21st Intl. Conf. on Pattern Recognition ( ICPR-2012), Japan.

[7]U. Pal, S. Sinha and B. B. Chaudhari, “Multi-Script identification from Indian documents”, IEEE, Proc. 7th Intl. Conf. on Document Analysis and Recognition, (ICDAR 2003) vol 2. pp 880-884.

[8]B. V. Dhandra, P. Nagabushan, Mallikarjun Hangarge, Ravindra Hegadi, V.S. Malemath, “Script Identification Based on Morphological Reconstruction in Document images”, IEEE, 18th Intl. Conf. on Pattern Recognition, (2006).

[9]U. Pal & B. B. Chaudhari, “Automatic Identification of English, Chinese, Arabic, Devanagari and Bangla Script Line”, IEEE, (2001).

[10]M. C. Padma, P. A. Vijaya, P. Nagabushan, “Language Identification from an Indian Multilingual Document Using profile features”, IEEE, Intl. Conf on Computer and Automation Engineering, (2009).

[11]M Swamy Das, D. Sandhya Rani, C. R. K. Reddy, “ Heuristic based Script Identification from Multilingual Text Documents”, IEEE, 1st Intl. Conf on Recent Advances in Information Technoogy (RAIT), (2012).

[12]P. A. Vijaya & M. C. Padma, “Text Line Identification from a Multilingual Document”, IEEE, Intl. Conf. on Digital Image Processing, (2009).

[13]Prakash K. Aithal, Rajesh G., Dinesh U. Acharya, Krisnamoorthi M. Subbareddy N.V, “Text Line Script Identification for a Trilingual Document”, IEEE, 2nd Intl. Conf. on Computing, Communication and Networking Technologies, (2010).

[14]Mohamed Benjelil, Remy Mullot, Adel M. Alimi, “Language and Script identification based on Steerable Pyramid Features”, IEEE, Intl. Conf. on Frontiers in Handwriting Recognition, (2012).

[15]Hiremath P. S., Shivashankar S., Jagdeesh D Pujari & V. Mouneswara, “Script Identification in a handwritten document image using texture features”, IEEE, Proc. 2nd Intl. Advance Computing Conf. (2010).

[16]Bashir, R. and Quadri, S.M.K., “Entropy based Script Identification of a multilingual Document Image”, IEEE Intl. Conf. Computing for Sustainable Global Development (INDIACom), 2014 Page(s): 19 – 23.

[17]Obaidullah, S.M.; Mondal, A.; Roy, K. “Structural feature based approach for script identification from printed Indian document”, IEEE Intl. Conf. on Signal Processing and Integrated Networks (SPIN), 2014 Page(s): 120 - 124

[18]Ferrer, M.A.; Morales, A.; Pal, U., “LBP Based Line-Wise Script Identification”, 12th IEEE Intl. Conf. on Document Analysis and Recognition (ICDAR), 2013 Page(s): 369 – 373.

[19]Mingji Piao; Rongyi Cui, “An Approach to Script Identification in Multi-language Text Image”, IEEE 6th Intl. Conf. on Intelligent Networks and Intelligent Systems (ICINIS), 2013 Page(s): 248 – 251.

[20]Obaidullah, S.M.; Roy, K.; Das, N., “Comparison of different classifiers for script identificationfrom handwritten document”, IEEE Intl. Conf. on Signal Processing, Computing and Control (ISPCC), 2013 Page(s): 1 – 6.

[21]Saidani, A.; Echi, A.K.; Belaid, A. , “Identification of Machine-Printed and Handwritten Words in Arabic and Latin Scripts”, IEEE 12th Intl. Conf. on Document Analysis and Recognition (ICDAR), 2013 Page(s): 798 – 802.

[22]Angadi, S.A.; Kodabagi, M.M., “A fuzzy approach for word level script identification of text in low resolution display board images using wavelet features”, Intl. Conf. on Advances in Computing, Communications and Informatics (ICACCI), 2013 Page(s): 1804 – 1811.

[23]Banerjee, P.; Chaudhuri, B.B., “A System for Handwritten and Machine-Printed Text Separation in Bangla Document Images”, Intl. Conf. on Frontiers in Handwriting Recognition (ICFHR), 2012 Page(s): 758 – 762.

[24]Busch, Andrew; Boles, W.W.; Sridharan, S., “Texture for script identification”, Pattern Analysis and Machine Intelligence, IEEE Transactions on Volume: 27, Issue: 11 (2005), Page(s): 1720 – 1732.

[25]Debashis Ghosh, Tulika Dube, & Adamane P. Shivprasad, “Script Recognition – A Review”, IEEE, Trans. On PAMI Vol. 32 No. 12 pp 2142-2161 (2010).

[26]K. Roy, S. Kundu Das, Sk Md Obaidullah, “Script Identification from Handwritten Documents”, IEEE Proc. 3rd Intl. Conf on Computer Vision, Pattern Recognition, Image Processing and Graphics, (2011).

[27]U. Pal & B. B. Chaudhari, “Script Line Separation From Indian Multilingual Script Documents”, Proc 5th Intl. Conf, on Document Analysis ad Recognition, IEEE Comp. Society Press pp 406-409 (1999).

[28]Faoud Slimane, Slim Kanoun, Jean Hennebert, Adel M. Alimi, Rolf Ingold, “A study on font family and font size recognition applied to Arabic word images at ultra-low resolution”, Elsevier, Pattern Recognition Letters 34 (2013), 209-218.

[29]Rajesh Gopakumar, N. V. Subareddy, Krishnamoorthi Makkithaya & U. Dinesh Acharya, “Zone-based Structural feature extraction for Script Identification from Indian Documents”, IEEE, 5th Intl. Conf. on Industrial & Information Systems, (2010).

[30]Zhou, L.; Ping, X.J.; Zheng, E.G.; Guo, L., “Script identification based on wavelet energy histogram moment features”, 10th IEEE Intl. Conf. on Signal Processing (ICSP), 2010 Page(s): 980 – 983.

[31]Jianjia Pan; Yuanyan Tang, “A rotation-robust script identification based on BEMD and LBP”, IEEE Intl. Conf. on Wavelet Analysis and Pattern Recognition (ICWAPR), 2011 Page(s): 165 – 170.