Bi-gram based Query Expansion Technique for Amharic Information Retrieval System

Full Text (PDF, 574KB), PP.1-7

Views: 0 Downloads: 0

Author(s)

Abey Bruck 1,* Tulu Tilahun 1

1. Department of Computer Science and IT, AMiT, Arba Minch University, Arba Minch, 21, Ethiopia

* Corresponding author.

DOI: https://doi.org/10.5815/ijieeb.2015.06.01

Received: 11 Aug. 2015 / Revised: 8 Sep. 2015 / Accepted: 2 Oct. 2015 / Published: 8 Nov. 2015

Index Terms

Information retrieval, bi-gram, query expansion, relevant document, relevance feedback

Abstract

Information retrieval system has been using to connect users of the information and information repository corpora. Even though the task of information retrieval systems is to retrieve relevant information, it is very difficult to find a perfect information retrieval system which is capable of retrieving relevant and only relevant documents as per user's query. The aim of this research is to increase precision of an Amharic information retrieval system while preserving the original recall. In order to achieve this bi-gram technique has been adopted for the query expansion. The main reason for performing query expansion is to provide relevant documents as per users' query that can satisfy their information need. Because users are not fully knowledgeable about the information domain area, they mostly formulate weak queries to retrieve documents. Thus, they end up frustrated with the results found from an information retrieval system. Amharic language has many meaning for a single word and also the word can be found in different form. These are some of the challenges that made the information retrieval system performing at very low level. Query expansion methods outperform in differentiating the various meanings of a polysemous term and find synonymous terms for reformulating users' query. Bi-gram technique uses the underling theory of expanding a query; using terms that appear adjacent to a query term frequently. The proposed technique was integrated to an information retrieval system. Then the retrieval system is tested with and without using bi-gram technique query expansion. The test result showed that bi-gram based method outperformed the original query based retrieval, and scored 8% improvement in total F-measure. This is an encouraging result to design an applicable search engine, for Amharic language.

Cite This Paper

Abey Bruck, Tulu Tilahun, "Bi-gram based Query Expansion Technique for Amharic Information Retrieval System", International Journal of Information Engineering and Electronic Business(IJIEEB), vol.7, no.6, pp.1-7, 2015. DOI:10.5815/ijieeb.2015.06.01

Reference

[1]Wang, X. (2009), "Improving web search for difficult queries", Unpublished paper available at University of Illinois at Urbana-Champaign.

[2]Photoxels (2010), "Amount of digital information created in 2010 reach 1.2 Zettabytes", Canada, EMC Corporation.

[3]Alemayehu (2002), "Application of query expansion for Amharic information retrieval system", MSc Thesis, Addis Ababa University Ethiopia.

[4]Baeza-Yates R. and Ribeiro-Neto B. (1999), "Modern information retrieval" , 2nded, Addison-Wesley-Longman Publishers, England.

[5]Saracevic, T (1995), "Evaluation of evaluation in information retrieval", proceedings of the 18th annual international ACM SIGIR Conference on Research and development in information retrieval special issue of SIGIR Forum, pp. 138-146.

[6]Greenberg, J. (2001), "Optimal query expansion (QE) processing methods with semantically encoded structured thesauri terminology", Journal of the American Society for information science and Technology, Vol. 52, No. 6 pp. 487-98.

[7]Manning, C. Raghavan, P. Schutze and H. (2009), "An introduction to information retrieval", Cambridge university press, England.

[8]Bloor, T. (1995), "The Ethiopic Writing System: A Profile", Journal of the Simplified Spelling Society, Vol.19, No. 2, pp. 30-36.

[9]Jurafsky, D and Martin J.H (2000), "Speech and Language Processing" 2nd ed, John Benjamins Publishing Company, Amsterdam.

[10]Tilahun, T. (2014), "Linguistic Localization of Opinion Mining from Amharic Blogs" International Journal of Information Technology & Computer Sciences Perspectives, 3(1), 890.

[11]Tilahun, T. and Sharma, D. (2015), "Design and Development of E-Governance Model for Service Quality Enhancement." Journal of Data Analysis and Information Processing, 3, 55-62. doi: 10.4236/jdaip.2015.33007.

[12]Balakumaran P.J, Vignesh Ramamoorthy. H,"Evolving An E-Governance System for Local Self-Government Institutions for Transparency and Accountability", IJIEEB, vol.5, no.6, pp.40-46, 2013. DOI: 10.5815/ijieeb.2013.06.05.

[13]Zakaria Itahriouan, Noura Aknin, Anouar Abtoy, Kamal Eddine El Kadiri,"Harnessing Social Networks Resources to Bring Social Interactions into Web-based IDEs", IJIEEB, vol.7, no.4, pp.24-30, 2015. DOI: 10.5815/ijieeb.2015.04.04.

[14]Shivani K. Purohit, Ashish K. Sharma,"Database Design for Data Mining Driven Forecasting Software Tool for Quality Function Deployment", IJIEEB, vol.7, no.4, pp.39-50, 2015. DOI: 10.5815/ijieeb.2015.04.06.