A New Algorithm for Skew Detection of Telugu Language Document based on Principle-axis Farthest Pairs Quadrilateral (PFPQ)

Full Text (PDF, 1292KB), PP.47-58

Views: 0 Downloads: 0

Author(s)

MSLB. Subrahmanyam 1,* V.Vijaya Kumar 2 B. Eswara Reddy 3

1. JNTU Kakinada, Kakinada, 533001, India

2. Dean Dept. of CSE &IT, Anurag Group of Institutions (Autonomous), Hyderabad, India

3. CSE Dept. & Principal of JNTU-A College of Engineering, Kalikiri,India

* Corresponding author.

DOI: https://doi.org/10.5815/ijigsp.2018.03.06

Received: 29 Sep. 2017 / Revised: 8 Oct. 2017 / Accepted: 16 Oct. 2017 / Published: 8 Mar. 2018

Index Terms

Indian languages, compound characters, complex categories, painting and directional smearing

Abstract

Skew detection and correction is one of the major preprocessing steps in the document analysis and understanding. In this paper we are proposing a new method called “Principle-axis farthest pairs Quadrilateral (PFPQ)”  mainly for detecting skew in the Telugu language document and also in other Indian languages. One of the popular and classical languages of India is Telugu language. The Telugu language is spoken by more than 80 million people. The Telugu language consists of simple and complex characters attached with some extra marks known as “maatras” and “vatthulu”. This makes the process of skewing of Telugu document is more complex when compared to other languages. The PFPQ, initially performs pre-processing and divides the text in to connected components and estimates principle axis furthest pair quadrilateral then removes the small and large portions of quadrilaterals of connected components. Then by using painting and directional smearing algorithms the PFPQ estimates the skew angle and performs the de-skew. We tested extensively the proposed algorithm with five different kinds of documents collected from various categories i.e., Newspapers, Magazines, Textbooks, handwritten documents, Social media and documents of other Indian languages. The images of these documents also contain complex categories like scientific formulas, statistical tables, trigonometric functions, images, etc. and encouraging results are obtained. 

Cite This Paper

MSLB. Subrahmanyam, V. Vijaya Kumar, B. Eswara Reddy, "A New Algorithm for Skew Detection of Telugu Language Document based on Principle-axis Farthest Pairs Quadrilateral (PFPQ) ", International Journal of Image, Graphics and Signal Processing(IJIGSP), Vol.10, No.3, pp. 47-58, 2018. DOI:10.5815/ijigsp.2018.03.06

Reference

[1]Postl, W.: Detection of linear oblique structures and skew scan in digitized documents. In: Proc. 8th International Conference on Pattern Recognition, pp. 687–689 (1986)

[2]Baird, H.S.: The skew angle of printed documents. In: Proc. SPSE 40th Symposium Hybrid Imaging, Rochester, NY, pp. 739–743 (1987)

[3]Ciardiello, G., Scafuro, G., Degrandi,M.T., et al.: An experimental system for office document handling and text recognition. In: Proc. 9th International Conference on Pattern Recognition, pp. 739–743 (1988)

[4]Ishitani, Y.: Document skew detection based on local region complexity. In: Proc. 2nd International on Document Analysis and Recognition, Japan, pp. 49–52 (1993)

[5]Papandreou, A.,Gatos, B.:A novel skew detection technique based on vertical projections. In: Proceedings of the 11th International Conference on Document Analysis and Recognition, ICDAR ’11, pp. 384–388 (2011)

[6]Papandreou, A., Gatos, B., Perantonis, S.J., Gerardis, I., 2014.Efficient skew detection of printed document images based on novel combination of enhanced profiles

[7]Duda, R.O., Hart, P.E.: Use of Hough transformation to detect lines and curves in pictures. Commun. ACM 15, 11–15 (1972)

[8]Hough, P.V.C.: Machine analysis of bubble chamber pictures. In: 2nd International Conference on High-Energy Accelerators, pp.554–558 (1959)

[9]Srihari, N., Govindaraju, V.: Analysis of textual images using the Hough transform. Mach. Vis. A 2(3), 141–153 (1989)

[10]Hinds, J., Fisher, L., D’Amato, D.P.: A document skew detection method using run-length encoding and the Hough transform. In: Proceedings of the 10th International Conference Pattern Recognition. IEEE CS Press, Los Alamitos, CA, pp. 464–468 (1990)

[11]Manjunath, V.N., Kumar, G.H., Shivakumara, P.: Skew detection technique for binary document images based on Hough transform.Int. J. Technol. 13(3), 194–200 (2006)

[12]Yu, B., Jain, A.K.: A robust and fast skew detection algorithm for generic documents. Pattern Recogn. 29(10), 1599–1629 (1996)

[13]Wang, J., Leung, M.K.H., Hui, S.C.: Cursive word reference line detection. Pattern Recogn. 30(3), 503–511 (1997)

[14]Singh, C., Bhatia, N., Kaur, A.: Hough transform based fast skew detection and accurate skew correction methods. Pattern Recogn.41, 3528–3546 (2008)

[15]Abdelhak Boukharouba A new algorithm for skew correction and baseline detection based on the randomized Hough Transform.

[16]Hashizume, A., Yeh, P.-S., Rosenfeld, A.: A method of detecting the orientation of aligned components. Pattern Recognit. Lett. 4, 125–132 (1986)

[17]Gorman,L.: The document spectrum for page layout analysis. IEEE Trans. Pattern Anal. Mach. Intell. 15(11), 1162–1173 (1993)

[18]Lu,Y., Tan, C.L.:A nearest-neighbor chain based approach to skew estimation in document images. Pattern Recogn. Lett. 24, 2315– 2323 (2003)

[19]Yan, H.: Skew correction of document images using interline cross correlation. CVGIP: Graph. Models Image Process. 55(6), 538–543 (1993).

[20]Gatos, B., Papamarkos, N., Chamzas, C.: Skew detection and text line position determination in digitized documents. Pattern Recogn. 30(9), 1505–1519 (1997)

[21]Chou, C.-H., Chu, S.-Y., Chang, F.: Estimation of skew angles for scanned documents based on piecewise covering by parallelograms. Pattern Recogn. 40, 443–455 (2007)

[22]Deya, P., Noushath, S.: e-PCP: A robust skew detection method for scanned document images. Pattern Recogn. 43, 937–948 (2010)

[23]Alireza, A.,Umapadam, P.,Nagabhushanm, P., Kimura, F.:An Efficient Skew Estimation Technique for Scanned Documents: An Application of Piece-wise Painting Algorithm 2016.

[24]NVG Raju, V. Vijay Kumar, OS Rao, “Author based rank vector coordinates(ARVC) Model for Authorship Attribution”,I.J. Image, Graphics and Signal Processing(IJIGSP), Vol. 5, 2016, pp:68-75, ISSN: 2074-9074

[25]G. Srinivasa Rao, V.Vijaya Kumar, Penmesta Suresh Varma, “Cellular Automata Clustering Based on Morphological Reconstruction (CACMR)”, I.J. Graphics, Vision and Image Processing (IJIGSP), Vol. 15, Iss.2, 2015, pp: 1-8, ISSN: 1687-398X,

[26]K. Srinivasa Reddy, V.Vijaya Kumar, B.Eshwarareddy, “Face Recognition based on Texture Features using Local Ternary Patterns”,   I.J. Image, Graphics and Signal Processing (IJIGSP), Vol.10, 2015, pp: 37-46, ISSN: 2074-9082.

[27]V.Vijaya Kumar, A. Srinivasa Rao,   YK Sundara Krishna, “Dual Transition Uniform LBP Matrix for Efficient Image Retrieval”,   I.J. Image, Graphics and Signal Processing (IJIGSP), Vol. 8, 2015, pp: 50-57.

[28]G S Murty ,J SasiKiran , V.Vijaya Kumar, “Facial expression recognition based on features derived from the distinct LBP and GLCM”, International Journal of Image, Graphics And Signal Processing (IJIGSP), Vol.2, Iss.1, pp. 68-77,2014, ISSN: 2074-9082. 

[29]Lakhdar Belhallouche ,,  Kamel Belloulata, Kidiyo Kpalma , A New Approach to Region Based Image Retrieval using Shape Adaptive Discrete Wavelet Transform,  I.J. Image, Graphics and Signal Processing, 2016, 1, 1-14.

[30]Prasanthi Jasmine1 ; P. Rajesh Kumar2 , Color and Rotated M-Band Dual Tree Complex Wavelet Transform Features for Image RetrievalM. I.J. Image, Graphics and Signal Processing, 2014, 9, 1-10.

[31]V. Madhusudhana Rao, Dr. S.Pallam Setty, Dr. Y.Srinivas,  An Efficient System for Medical Image Retrieval using Generalized Gamma Distribution, I.J. Image, Graphics and Signal Processing, 2015, 6, 52-58.

[32]Abbas H. Hassin Alasadi,  Saba Abdual Wahid, Effect of Reducing Colors Number on the Performance of CBIR System ,  I.J. Image, Graphics and Signal Processing (IJIGSP), 2016, 9, 10-16

[33]Abdelhamid Abdesselam,  Edge Information for Boosting Discriminating Power of Texture Retrieval Techniques s and techniques, I.J. Image, Graphics and Signal Processing (IJIGSP), 2016, 4, 16-28.

[34]C. Vasantha Lakshmi, Ritu Jain, C. Patvardhan, “OCR of Printed Telugu Text with High Recognition Accuracies”, ICVGIP 2006, pp. 786 – 795. 

[35]M.B. Sukhaswami, Seetharamulu, A.K Pujari, Recognition of Telugu characters using Neural Networks, Int. Journal of Neural Systems, September, 1995, 6(3), page 317 – 357.

[36]Vijaya Kumar Koppula , Atul Negi Fringe Map Based Text Line Segmentation of Printed Telugu Document Images 2011 conference.

[37]B.B. Chaudhuri and U. Pal Skew Angle Detection of Digitized Indian Script Documents IEEE Transactions On Pattern Analysis And Machine Intelligence, Vol. 19, No. 2, February 1997.

[38]M. Ramanan, A. Raman and  and E. Y.A. Charles  A Preprocessing Method for Printed Tamil Documents:Skew Correction and Textual Classification

[39]Lalita Kumari1, Swapan Debbarma2, Radhey Shyam3 Text Orientation Detection from Document Image of Indian Scripts, IJCCIS, Vol2. No1. ISSN: 0976–1349 July – Dec 2010

[40]Dharam Veer Sharma, Gurpreet Singh Lehal A Fast Skew Detection and Correction Algorithm for Machine Printed Words in Gurmukhi Script 2009 ACM 978-1-60558-698-4/09/07

[41]Shamita Ghosh and Bidyut B. Chaudhuri Composite Script Identification and Orientation Detection for Indian Text Images 2011 International Conference on Document Analysis and Recognition