Srinivasa Rao A.V

Segmentation of Ancient Telugu Text Documents

Full Text (PDF, 447KB), PP.8-14

Views: 0 Downloads: 0

Author(s)

Srinivasa Rao A.V ^1,*

1. ECE Department, AKRGCET, Nallajerla, Andhra Pradesh, India

* Corresponding author.

DOI: https://doi.org/10.5815/ijigsp.2012.06.02

Received: 3 Apr. 2012 / Revised: 28 Apr. 2012 / Accepted: 7 Jun. 2012 / Published: 8 Jul. 2012

Index Terms

Segmentation, Profile, Line, Syllable, Gaussian derivative kernel

Abstract

OCR of ancient document images remains a challenging task till date. Scanning process itself introduces deformation of document images. Cleaning process of these document images will result in information loss. Segmentation contributes an invariance process in OCR. Complex scripts, like derivatives of Brahmi, encounter many problems in the segmentation process. Segmentation of meaningful units, (instead of isolated patterns), revealed interesting trends. A segmentation technique for the ancient Telugu document image into meaningful units is proposed. The topological features of the meaningful units within the script line are adopted as a basis, while segmenting the text line. Horizontal profile pattern is convolved with Gaussian kernel. The statistical properties of meaningful units are explored by extensively analyzing the geometrical patterns of the meaningful unit. The efficiency of the proposed algorithm involving segmentation process is found to be 73.5% for the case of uncleaned document images.

Cite This Paper

Srinivasa Rao A.V,"Segmentation of Ancient Telugu Text Documents", IJIGSP, vol.4, no.6, pp.8-14, 2012. DOI: 10.5815/ijigsp.2012.06.02

Reference

[1]N.Otsu, “A threshold selection method from a gray level histograms”, IEEE Trans. Systems, Man, Cybernet., 9(1),1979, pp. 62- 66

[2]S.S.G.Nagy and S.Stoddard, “Document analysis with expert system,” Procedings of Pattern Recognition in Practise II, June 1985.

[3]S.Srihari and V.Govindaraju, “Analysis of textual images using the hough transform,” Machine Vision and Applications, vol.2, no.3,. Springer(1989 ), pp. 141-153.

[4]G.Ciardiello, G.Scanfuro, M.Degrandi, M.Spada, and M.P.Roccotelli, “An experimental system for office document handling and text recognition,” patent no: US185813A in feb, 09, 1993.

[5]L.O'Gorman, “The document spectrum for page layout analysis,” IEEE Trans. Pattern Anal.Mach.Intell., vol. 15, no. 11, pp. 1162-1173, 1993.

[6]A.W.Senior and A.J.Robinson, “An off-line cursive hand-writing recognition system,” IEEE Trans. Pattern Anal.Mach.Intell., 20(3): 309-321, March 1998.

[7]E.Kavallieratou, N.Dromazou, N.Fakotakis, and G.Kokkinakis,”An integrade system for hand written document image processing,” International Journal of Pattern Recognition and Artificial Intellegence, 17(40), pp. 617-636,2003

[8]Atul Negi, K Nikhil Shanker, and Chandra Kanth Chereddi,”Localization, Extraction and recognition of Text in Telugu document Images,” ICDAR 2003

[9]C V Lakshmi & C Patvardhan, ‘Optical Character Recognition of Basic Symbols in Printed Telugu Text’, IE (I) Journal-CP, Vol 84, November 2003 pp. 66-71.

[10]Z.Shi,S.Setlur and V.Govindaraju, “Text extraction from gray scale historical document images using adaptive local connectivity map. In 8th International Conference on Document Analysis and Recognition, ICDAR, volume 2, pp. 794-798, Seoul, Korea, August 2005.

[11]R.Manmatha, J.L. Rothfeder, “A Scale Space approach for automatically segmenting words from historical handwritten docuemnts,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(8), pp. 1212-1225, August 2005.

[12]B.Gatos, I.Pratikakis, S.J.Perantonis, “Adaptive degraded document image binarization,” Pattern Recognition vol 39, 2006 pp. 317-327

[13]L.L.Sulem, Abderrazak Zahour, Bruno Taconet, “Text Line Segmentation of Historical Documents: a survey ” International journal on Document Analysis and Recognition, vol 9, pp.123-138, springer,2007.

[14]A.V.S.Rao, ,N.V.Rao, A.S.C.S.Sastry, L.P.Reddy,”Canonical Syllable Segmentation of Telugu document images,” Procedings of International Conference TENCON 2008, Hyderabad, 18-21 November, 2008.

[15]D.Gosh, T.dube, A.P.Shivaprasad, “Script Recognition-A Review,” IEEE Transactions on Pattern Analysis and machine Intelligence, 2009

[16]Vijaya Kumar Koppula, Negi Atul, Utpal Garain “ Robust Text Line, Word and Character Extraction from Telugu Document Image,” Proceeding ICETET 09' Proceedings of the 2009 Second International Conference on Emergign Trends in Engineering and Technology

[17]Nikos Nikolaou a.b, Michael Makridis a, Basilis Gatos b, Nikolaos Stamatopoulos b, Nikos Papamarkos a Segmentation of historical machine-printed documents using Adaptive Run Length Smoothing and skeleton segmentation paths,” Image and Vision Computing 28 (2010) pp. 590–604

[18]A.V.S.Rao,G.Sunil,N.V.Rao,T.S.K.Prabhu, A.S.C.S.Sastry, L.P.Reddy,”Adaptive Binarization of Ancient Documents,” Procedings of International Conference on Machine Vision, 978-0-7695- 3944-7/10, 2010.

International Journal of Image, Graphics and Signal Processing (IJIGSP)