Design and Implementation of IR System for Tigrigna Textual Documents

Full Text (PDF, 858KB), PP.31-38

Views: 0 Downloads: 0

Author(s)

Teklay Birhane 1,* Birhanu Hailu 1

1. Department of Information Science, Mekelle University, Mekelle, Ethiopia

* Corresponding author.

DOI: https://doi.org/10.5815/ijmecs.2019.11.05

Received: 10 Oct. 2019 / Revised: 21 Oct. 2019 / Accepted: 28 Oct. 2019 / Published: 8 Nov. 2019

Index Terms

Corpus, Indexing, Information Retrieval, Searching, Tigrigna Language, Vector Space Model

Abstract

Nowadays, various amount of information’s are available on the internet. To search relevant documents from the internet development of information retrieval system or search engines is necessary. Therefore, this paper deals with development of Information Retrieval system for Tigrigna textual documents. It helps to find relevant documents from the internet, which are stored in Tigrigna language for the Tigrigna language users to satisfy their information need. The system includes two sub systems those are indexing and searching part. The indexing part is the process of organizing filtered Tigrigna documents using keywords extracted from the entire Tigrigna collection or corpus. It is an offline process carried out by the producers or authors world to speed up searching of information from the entire document as per users query. Searching is the process of scanning documents to find relevant documents that matches to the users query or information need. It is an online process mostly carried out by the users or readers world. Vector space model techniques was applied to implement this system. Vector space model is the most core information retrieval technique used to calculate similarity measure between the query and the documents finally it ranks the most relevant documents to the given query according their similarity score in descending order. According to this, the retrieval system was tested and the experimental results of the system in Tigrinya documents returned an encouraging and promising result. The system has registered, 70% precision and 84% Recall.

Cite This Paper

Teklay Birhane, Birhanu Hailu, " Design and Implementation of IR System for Tigrigna Textual Documents ", International Journal of Modern Education and Computer Science(IJMECS), Vol.11, No.11, pp. 31-38, 2019. DOI:10.5815/ijmecs.2019.11.05

Reference

[1]T.Semere,"Probabilistic Tigrigna-Amharic Cross Language Information Retrieval (CLIR) ",Msc Thesis, School of Information Science, Addis Ababa University,2013.
[2]G. Gezehagn," Afaan Oromo Text Retrieval System ",Msc Thesis ,School of Information Science ,Addis Ababa University,2012.
[3]Hirpa, “Probabilistic Information Retrieval for Amharic Language”, Msc Thesis, School of Information Science, Addis Ababa University, 2012.
[4]Ahmad," Applying Vector Space Model (VSM) Techniques in Information Retrieval For Arabic Language", N.D.
[5]Polyvyanyy, D. Kuropka," A Quantitative Evaluation of the Enhanced Topic-Based Vector Space Model ", Hasso-Plattner-Institut Für Softwaresystemtechnik and Der Universität Potsdam, 2007.
[6]Y. Fisseha, “Development of Stemming Algorithm for Tigrigna Text”, Msc Thesis, School Of Information Science, Addis Ababa University, 2011.
[7]R. Baeza-Yates, Information Retrieval: Data Structure & Algorithms, 1st Ed. Waterloo: University of Waterloo, 2004, Pp. 1-630.
[8]P. Ingwersen, Information Retrieval Interaction, 1st Ed. London: Taylor Graham Publishing, 2002.
[9]D. Manning, P. Raghavan, and H. Schutze, An Introduction to Information Retrieval, Online Edition, Cambridge: Cambridge up, 2009.
[10]Hiemstra, Information Retrieval Models, Wiley Online. New York: John Wiley & Sons, Ltd, 2009.
[11]H. E. Wolff, sematic-Cushitic Languages, Encyclopedia Britannica. Encyclopedia Britannica, Inc., 2012.
[12]Kula Kekeba, V. Varma, and P. Pingali, Evaluation of Oromo-English Cross-Language Information Retrieval, Journal of International Joint Conference on Artificial Intelligence (IJCAI)-2007, Vol. IIIT/TR/20, June, 2008.
[13]S. Heinz, “Efficient Single‐Pass Index Construction for Text Databases", Journal of The American Society, Vol. 54, No. 8, Pp. 713-729, 2003.
[14]Y. K. Tedla, “Nagaoka Tigrinya Corpus : Design and Development of Part-of-speech Tagged Corpus,” Nagaoka University of Technology, pp. 1–4, 2016.
[15]H. Kidu, “A Mobile Based Tigrigna Language Learning Tool, University of Gondar” pp. 50–53, 2017.