Assessing Query Translation Quality Using Back Translation in Hindi-English CLIR

Full Text (PDF, 393KB), PP.51-59

Views: 0 Downloads: 0

Author(s)

Ganesh Chandra 1,* Sanjay k. Dwivedi 1

1. Department of Computer Science, BBAU (A Central University), Lucknow, U.P, India

* Corresponding author.

DOI: https://doi.org/10.5815/ijisa.2017.03.07

Received: 1 Jul. 2016 / Revised: 11 Oct. 2016 / Accepted: 5 Dec. 2016 / Published: 8 Mar. 2017

Index Terms

Back-Translation, BLUE, METEOR, TER & query translation, transliteration

Abstract

Cross-Language Information Retrieval (CLIR) is a most demanding research area of Information Retrieval (IR) which deals with retrieval of documents different from query language. In CLIR, translation is an important activity for retrieving relevant results. Its goal is to translate query or document from one language into another language. The correct translation of the query is an essential task of CLIR because incorrect translation may affect the relevancy of retrieved results.
The purpose of this paper is to compute the accuracy of query translation using the back translation for a Hindi-English CLIR system. For experimental analysis, we used FIRE- 2011 dataset to select Hindi queries. Our analysis shows that back translation can be effective in improving the accuracy of query translation of the three translators used for analysis (i.e. Google, Microsoft and Babylon). Google is found best for the purpose.

Cite This Paper

Ganesh Chandra, Sanjay K. Dwivedi,"Assessing Query Translation Quality Using Back Translation in Hindi-English CLIR", International Journal of Intelligent Systems and Applications (IJISA), Vol.9, No.3, pp.51-59, 2017. DOI:10.5815/ijisa.2017.03.07

Reference

[1]B. Zhou, Y. Yao, “Evaluating information retrieval system performance based on user preference,” journal of intelligent information systems, Springer link, vol. 34, issue 3, pp. 227-248 , June. 2010.
[2]Liang, Ye, et al. “Multilingual Information Retrieval and Smart News Feed Based on Big Data.” 12th Web Information System and Application Conference (WISA). IEEE, 2015.
[3]Grefenstette, Gregory, ed. Cross-language information retrieval. Vol. 2. Springer Science & Business Media, 2012.
[4]Campos, Ricardo, et al. “Survey of temporal information retrieval and related applications.” ACM Computing Surveys (CSUR) Vol. 47.2, 2015.
[5]Kumar, Aarti and Das Sujoy. “Topology for Linguistic Pattern in English-Hindi Journalistic Text Reuse.” International Journal of Information Technology and Computer Science (IJITCS) Vol 8, pp 75-86, 2016.
[6]S.K. Dwivedi and G. Chandra. “A Survey on Cross- Language Information Retrieval,” International Journal on Cybernetics & Informatics (IJCI) Vol.5, No.1, Feb 2016.
[7]Phyue, Soe Lai. “Development of Myanmar-English Bilingual WordNet like Lexicon.” International Journal of Information Technology and Computer Science (IJITCS) 6.10: 28, 2014.
[8]Sharma, Manisha, and G. N. Purohit. “Evaluation of machine translation.” Proceedings of the International Conference & Workshop on Emerging Trends in Technology. ACM, 2011.
[9]K. Papineni, S. Roukos, T. Ward, T. and W.J. Zhu, 2002. “BLEU: a method for automatic evaluation of machine translation,” In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL '02). Stroudsburg, PA, USA, pp. 311-318, 2002.
[10]Denkowski, Michael, and Alon Lavie. “Meteor 1.3: Automatic metric for reliable optimization and evaluation of machine translation systems.” Proceedings of the Sixth Workshop on Statistical Machine Translation. Association for Computational Linguistics, 2011.
[11]M. Snover, B. Dorr, R. Schwartz, L Micciulla & J Makhoul, “A study of translation edit rate with targeted human annotation,” Proceedings of association for machine translation in the Americas. Vol. 200. No. 6. 2006.
[12]O’Brien, Sharon. “Towards predicting post-editing productivity.” Machine translation Vol. 25 No.3, pp 197-215, September 2011.
[13]G. Doddington, “Automatic Evaluation of Machine Translation Quality using N-gram Co-occurrence Statistics”, Proceedings of 2nd Human Language Technologies Conference (HLT-02). San Diego, CA, pp. 128-132. 2002.
[14]Chen, Boxing, Roland Kuhn, and Samuel Larkin. “Port: a precision-order-recall mt evaluation metric for tuning.” Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1. Association for Computational Linguistics, 2012.
[15]L. F. Aaron and S. Lidia, “LEPOR: A Robust Evaluation Metric for Machine Translation with Augmented Factors”, Proceedings of COLING 2012: Posters, Mumbai, pp. 441–450, December 2012.
[16]Chen, Boxing, Roland Kuhn, and George Foster. “Improving AMBER, an MT evaluation metric.” Proceedings of the Seventh Workshop on Statistical Machine Translation. Association for Computational Linguistics, 2012.
[17]Bojar, Ondrej, et al. “Findings of the 2014 workshop on statistical machine translation.” Proceedings of the Ninth Workshop on Statistical Machine Translation. Association for Computational Linguistics Baltimore, MD, USA, 2014.
[18]S. Niessen, F. J. Och, G. Leusch and H. Ney, “An evaluation tool for machine translation: Fast evaluation for MT research.” In Proceedings of the 2nd International Conference on Language Resources and Evaluation, 2000.
[19]X. Song and T. Cohn. “Regression and ranking based optimization for sentence level machine translation evaluation.” Proceedings of the Sixth Workshop on Statistical Machine Translation. Association for Computational Linguistics, 2011.
[20]D.A. Hull and G. Grefenstette. “Querying Across Languages: A Dictionary-Based Approach to Multilingual Information Retrieval”. In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 49–57, 1996.
[21]D.A. Hull, “Using Structured Queries for Disambiguation in Cross-Language Information Retrieval,”In Electronic Working Notes of the AAAI Spring Symposium on Cross-Language Text and Speech Retrieval, 1997.
[22]L. Ballesteros and W. B. Croft, “Phrasal Translation and Query Expansion Techniques for Cross-Language Information Retrieval,” In Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 84–91, 1997.
[23]B.J. Dorr and D. W. Oard, “Evaluating Resources for Query Translation in Cross-Language Information Retrieval,” In Proceedings of the 1st International Conference on Language Resources and Evaluation, pp. 759–764, 1998.
[24]J.G. Carbonell, Y. Yang, R. E. Frederking, R. D. Brown, Y. Geng and D. Lee. “Translingual Information Retrieval: A Comparative Evaluation”. In Proceedings of the 15th International Joint Conference on Artificial Intelligence, pp. 708–714, 1997.
[25]J. Xu, R. Weischedel and C. Nguyen, “Evaluating a probabilistic model for cross-lingual information retrieval.” Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 2001.
[26]J. Gao, J. Y. Nie and M.Zhou, “Statistical query translation model for crossl anguage information retrieval”, ACM Transactions on Asian Language Information Processing (TALIP), Volume 5, Issue 4, Pages: 323 - 359, 2006, ISSN: 1530-0226, December 2006.
[27]M. Braschler, “Combination approaches for multilingual text retrieval. Information Retrieval,” Vol.7 (1-2):183–204, January, 2004.
[28]W. Gao, C. Niu, M. Zhou and K.F. Wong, “Joint Ranking for Multilingual Web Search,” In Proceedings of the 31st European Conference on Information Retrieval (ECIR), Toulouse, France , pp.114-125, 2009.
[29]B. Herbert, G. Szarvas, and I. Gurevych, “Combining query translation techniques to improve cross-language information retrieval,” In Proceedings of the 33rd European Conference on Advances in Information Retrieval, ECIR’11, Berlin, Heidelberg. Springer-Verlag, pages 712–715, 2011.
[30]F. Ture, J.J. Lin, and D. W. Oard, “Combining Statistical Translation Techniques for Cross-Language Information Retrieval,” In Proceedings of the 24th International Conference on Computational Linguistics, COLING ’12, pp. 2685–2702, 2012.
[31]R.W. Brislin, “Back-Translation for Cross-Cultural Research,” Journal of Cross-Cultural Psychology 1, 1970.
[32]D. He, J Wang, D.W Oard and M Nossal, “Comparing user-assisted and automatic query translation,” Workshop of the Cross-Language Evaluation Forum for European Languages. Springer Berlin Heidlberg, 2002.
[33]D. Grunwald, D and N.M. Goldfarb, “Back Translation for Quality Control of Informed Consent,” Journal of Clinical Research Best Practices, 2 (2). Available at: www.translationdirectory.com/article1043.htm.
[34]U. Ozolins, “Issues of back translation methodology in medical translations,” Proceedings, FIT [International Federation of Translators] XVIII Congress, Shanghai, 2008.
[35]R. Rapp. “The back-translation score: automatic MT evaluation at the sentence level without reference translations.” In Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, ACL Short’ 09, pages 133–136, 2009.
[36]M. Miyabe and T. Yoshino, “Evaluation of the Validity of Back-Translation as a Method of Assessing the Accuracy of Machine Translation,” International Conference on Culture and Computing (Culture Computing), Kyoto, pp. 145-150, 2015.
[37]D. Zhou. M. Truran, T. Brailsford, V. Wade and H. Ashman, “Translation techniques in cross language information retrieval,” ACM Comput. Surv.45, 1, Article 1, 44 pages, November, 2012.
[38]Raju, BN V. Narasimha, MSVS Bhadri Raju, and K. V. V. Satyanarayana. “Translation approaches in Cross Language Information Retrieval.” Computer and Communications Technologies (ICCCT), 2014 International Conference on. IEEE, 2014.
[39]Kumaran, A., Mitesh M. Khapra, and Pushpak Bhattacharyya. “Compositional machine transliteration.” ACM Transactions on Asian Language Information Processing (TALIP) 9.4: 13, 2010
[40]Saravanan, K., Raghavendra Udupa, and A. Kumaran. “Improving Cross-Language Information Retrieval by Transliteration Mining and Generation.” Multilingual Information Access in South Asian Languages. Springer Berlin Heidelberg, 310-333, 2013.
[41]B. Babych and A. Hartley, “Extending BLUE MT evaluation method with frequency weighting.” Proceedings of the 42nd annual meeting on association for computational linguistics. Association for Computational Linguistics, 2004.
[42]ImTranslator Available at: http://imtranslator.net/translation/hindi/to-english/translati on/.
[43]G. Salton, “Automatic processing of foreign language documents,” Journal of the American Society for Information Science 21.3, 187-194, 1970.
[44]C. Peters, “Cross-Language Information Retrieval and Evaluation”, Lecture Notes in Computer Science 2069, Springer-Verlag, Germany, 2001.
[45]S. Banerjee and A. Lavie, “METEOR: An automatic metric for MT evaluation with improved correlation with human judgments” Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, Vol.29, 2005.
[46]R. W. Brislin, “Back-translation for cross-cultural research,” Journal of Cross-Cultural Psychology 1.3, 185-216, 1970.
[47]E.A. Nida, “Towards a Science of Translating: With Special Reference to Principles and Procedures Involved in Bible Translating,” Leiden, the Netherlands: Brill, 1964.
[48]M.Aiken, “Multilingual communication in electronic meetings,” ACM SIGGROUP Bulletin 23.1, 18-19, 2002.
[49]S. Climent, J. More and A Oliver, “Bilingual newsgroups in Catalonia: A challenge for machine translation,” Journal of Computer Mediated Communication 9(1), 2003.
[50]M. Miyabe, T. Yoshino, and T. Shigenobu, “Effects of repair support agent for accurate multilingual communication,” Pacific Rim International Conference on Artificial Intelligence. Springer Berlin Heidelberg, 2008.
[51]Qu Peng, Li Lu and Zhang lili. ”A Review of Advanced Topics in Information Retrieval,” Library and Information Service, Vol.52.No 3.China.2008, pp.19-23.
[52]K. Kishida, “Technical issues of cross-language information retrieval: a review,” Information Processing and Management international journal, science direct, vol. 41, issue 3, pp. 433-455, May.2005.
[53]W. C. Lin and H. H. Chen, “Merging mechanisms in multilingual information retrieval.” Workshop of the Cross-Language Evaluation Forum for European Languages. Springer Berlin Heidelberg, 2002.
[54]R.M.K Sinha, K. Sivaraman, and A. Agrawal, R. Jain, R Srivastava and A Jain, “ANGLABHARTI: a multilingual machine aided translation project on translation from English to Indian languages”. Sinha, R. M. K., et al. "ANGLABHARTI: a multilingual machine aided translation project on translation from English to Indian languages." Systems, Man and Cybernetics, 1995. Intelligent Systems for the 21st Century, IEEE International Conference on. Vol. 2. IEEE, 1995.
[55]Y. J. Zhang and T. Zhang. “Research on English-Chinese Cross-Language Information Retrieval.” 2007 International Conference on Machine Learning and Cybernetics. Vol. 6. IEEE, 2007.
[56]D.W. Oard and P. Hackett, “Document Translation for the Cross-Language Text Retrieval at the University of Maryland,” in Proceedings of TREC Conference, 1997.
[57]K. K. N. Kando, N. “Hybrid approach of query and document translation with pivot language for cross-language information retrieval,” in Proceedings of CLEF Conference, 2005.
[58]I. D. Melamed, R. Green and J. P. Turian, “Precision and recall of machine translation.” Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers-Volume 2. Association for Computational Linguistics, 2003.
[59]A.L.F. Han, D.F. Wong, L.S. Chao, L. He, Y. Lu, J. Xing and X. Zeng, “Language-independent Model for Machine Translation Evaluation with Reinforced Factors”, Proceedings of the XIV International Conference of Machine Translation Summit, Nice, France, pp. 215–222, International Association for Machine Translation Press September 2–6, 2013.
[60]C.Y. Lin, “ROUGE: Recall-oriented understudy for gisting evaluation.”1-12, 2003. http://berouge.com/.
[61]P. Virga. and S. Khudanpur. “Transliteration of proper names in cross-language applications.” Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval. ACM, 2003.
[62]K. Knight and J. Graehl, “Machine Transliteration”. Computational Linguistics, 24(4), pp 599-612, 1998.
[63]A. Fujii and T. Ishikawa. . “Japanese/English cross-language information retrieval: Exploration of query translation and transliteration.” Computers and the Humanities 35.4, pp 389-420, 2001.
[64]R. W. Brislin, W. J. Lonner, R.M. Thorndike, “Cross-Cultural Research Methods.” New York: John Wiley & Sons, 1973.
[65]V. I. Levenshtein, “Binary codes capable of correcting deletions, insertions and reversals.” Soviet physics doklady. Vol. 10. pp. 707–710, 1966.
[66]B. Chen and R. Kuhn. “Amber: A modified BLEU, Enhanced Ranking Metric.” Proceedings of the 6th Workshop on Statistical Machine Translation. 2011.