A Corpus Based Approach to Build Arabic Sentiment Lexicon

Full Text (PDF, 766KB), PP.16-23

Views: 0 Downloads: 0

Author(s)

Afnan Atiah Alsolamy 1,* Muazzam Ahmed Siddiqui 1 Imtiaz Hussain Khan 2

1. Department of Information Systems, King Abdulaziz University, Saudi Arabia

2. Department of Computer Science, King Abdulaziz University, Saudi Arabia

* Corresponding author.

DOI: https://doi.org/10.5815/ijieeb.2019.06.03

Received: 17 Sep. 2019 / Revised: 1 Oct. 2019 / Accepted: 10 Oct. 2019 / Published: 8 Nov. 2019

Index Terms

Sentiment analysis, Arabic sentiment lexicon, Propagation method, Similarity measure

Abstract

Sentiment analysis is an application of artificial intelligence that determines the sentiment associated sentiment with a piece of text. It provides an easy alternative to a brand or company to receive customers' opinions about its products through user generated contents such as social media posts. Training a machine learning model for sentiment analysis requires the availability of resources such as labeled corpora and sentiment lexicons. While such resources are easily available for English, it is hard to find them for other languages such as Arabic. The aim of this research is to build an Arabic sentiment lexicon using a corpus-based approach. Sentiment scores were propagated from a small, manually labeled, seed list to other terms in a term co-occurrence graph. To achieve this, we proposed a graph propagation algorithm and compared different similarity measures. The lexicon was evaluated using a manually annotated list of terms. The use of similarity measures depends on the fact that the words that are appearing in the same context will have similar polarity. The main contribution of the work comes from the empirical evaluation of different similarity to assign the best sentiment scores to terms in the co-occurrence graph.

Cite This Paper

Afnan Atiah Alsolamy, Muazzam Ahmed Siddiqui, Imtiaz Hussain Khan, "A Corpus Based Approach to Build Arabic Sentiment Lexicon", International Journal of Information Engineering and Electronic Business(IJIEEB), Vol.11, No.6, pp. 16-23, 2019. DOI:10.5815/ijieeb.2019.06.03

Reference

[1]M. Korayem, D. Crandall and M. Abdul-Mageed, "Subjectivity and sentiment analysis of arabic: A survey," in Advanced Machine Learning Technologies and Applications, 2012.
[2]L. Dey, S. Chakraborty, A. Biswas, B. Bose and S. Tiwari, "Sentiment Analysis of Review Datasets Using Naïve Bayes and K-NN Classifier.," International Journal of Information Engineering and Electronic Business., vol. 4, pp. 54-62, 2016.
[3]S. O. Oppong, D. Asamoah, E. O. Oppong and D. Lamptey, "Business Decision Support System based on Sentiment Analysis.," International Journal of Information Engineering and Electronic Business., vol. 1, pp. 36-49, 2019.
[4]Shoukry and A. Rafea, "Sentence-level Arabic sentiment analysis," in Collaboration Technologies and Systems (CTS), 2012 International Conference on, 2012.
[5]S. R. El-Beltagy and A. Ali, "Open Issues in the Sentiment Analysis of Arabic Social Media: A Case Study," in Innovations in Information Technology (IIT), 2013 9th International Conference on, 2013.
[6]B. Liu, "Sentiment analysis and opinion mining," Synthesis Lectures on Human Language Technologies, vol. 5, no. 1, pp. 1-167, 2012.
[7]M. Abdul-Mageed and M. T. Diab , "Toward building a large-scale Arabic sentiment lexicon," in Proceedings of the 6th International Global WordNet Conference, 2012.
[8]N. A. Abdulla, N. A. Ahmed, M. A. Shehab and M. Al-Ayyoub, "Arabic Sentiment Analysis: Lexicon-based and Corpus-based," in Applied Electrical Engineering and Computing Technologies (AEECT), 2013 IEEE Jordan Conference on, 2013 .
[9]Esuli and F. Sebastiani, "Determining the semantic orientation of terms through gloss classification," in Proceedings of the 14th ACM international conference on Information and knowledge management, 2005.
[10]S. Blair-goldensohn, T. Neylon, K. Hannan, G. A. Reis, R. Mcdonald and J. Reynar, "Building a sentiment summarizer for local service reviews," in WWW Workshop on NLP in the Information Explosion Era, 2008.
[11]F. Mahyoub , M. Siddiqui and M. Dahab , "Building an Arabic Sentiment Lexicon Using Semi-supervised Learning," Journal of King Saud University-Computer and Information Sciences, vol. 26, no. 4, pp. 417-424, 2014.
[12]V. Hatzivassiloglou and K. R. McKeown, "Predicting the semantic orientation of adjectives," in Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics, 1997.
[13]H. Kanayama and T. Nasukawa , "Fully automatic lexicon expansion for domain-oriented sentiment analysis," in Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, 2006.
[14]L. Velikovich, S. Blair-Goldensohn , K. Hannan and R. McDonald, "The viability of web-derived polarity lexicons," in Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, 2010.
[15]M. Elhawary and M. Elfeky, "Mining arabic business reviews," in Data Mining Workshops (ICDMW), 2010 IEEE International Conference on, 2010.
[16]N. Kaji and M. Kitsuregawa, "Building Lexicon for Sentiment Analysis from Massive Collection of HTML Documents," in Conference on Empirical Methods in Natural Language Processing, 2007.
[17]P. D. Turney, "Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews," in ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, 2002.
[18]Y. Lu, M. Castellanos, U. Dayal and C. Zhai, "Automatic construction of a context-aware sentiment lexicon: an optimization approach," in Proceedings of the 20th international conference on World wide web, 2011.
[19]X. Ding , B. Liu and P. S. Yu , "A holistic lexicon-based approach to opinion mining," in WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining, 2008.
[20]R. Eskander and O. Rambow, "SLSA: A Sentiment Lexicon for Standard Arabic.," in Proceedings of the 2015 conference on empirical methods in natural language processing., Lisbon, Portugal, 2015.
[21]Assiri, A. Emam and H. Al-Dossari, "Towards enhancement of a lexicon-based approach for Saudi dialect sentiment analysis.," Journal of information science., vol. 44, no. 2, pp. 184-202, 2017.
[22]A.-T. Nora, H. Al-Khalifa and A. AlSalman, "AraSenTi: Large-scale Twitter-specific Arabic sentiment lexicons.," in Proceedings of the 54th annual meeting of the association for computational linguistics., Berlin, Germany, 2016.
[23]Al-Thubaity, Q. Alqahtani and A. Aljandal, "Sentiment lexicon for sentiment analysis of Saudi dialect tweets.," in Proceedings of the 4th international conference on Arabic computational linguistics., Dubai, United Arab Emirates, 2018.
[24]T. Al-Moslmi, M. Albared, A. Al-Shabi, N. Omar and S. Abdullah, "Arabic senti-lexicon: Constructing publicly available language resources for Arabic sentiment analysis.," Journal of information science., vol. 44, no. 3, pp. 345-362, 2017.
[25]K. Sabra, R. Zantout, M. El Abed and L. Hamandi, "Sentiment analysis: Arabic sentiment lexicons.," in Proceedings of sensors networks smart and emerging technologies., 2017.
[26]G. Badaro, H. Jundi, H. Hajj, W. El-Hajj and N. Habash, "ArSEL: A large scale Arabic sentiment and emotion lexicon.," in Proceeding of the 3rd workshop on open-source Arabic corpora and processing tools. , Proceeding of the 3rd workshop on open-source Arabic corpora and processing tools. , 2018.
[27]Pasha, M. Al-Badrashiny, M. Diab, A. El Kholy, R. Eskander, N. Habash, M. Pooleery, O. Rambow and R. Roth, "MADAMIRA: A Fast, Comprehensive Tool for Morphological Analysis and Disambiguation of Arabic," in Proceedings of the 9th International Conference on Language Resources and Evaluation, 2014.
[28]P. Turney and P. Pantel, "From Frequency to Meaning: Vector Space Models of Semantics," Journal of Artificial Intelligence Research, no. 37, pp. 41-188, 2010.
[29]Huang, "Similarity Measures for Text Document Clustering," in Proceedings of the sixth new zealand computer science research student conference (NZCSRSC2008), Christchurch, 2008.
[30]M. Jayaram, P. G.K. and D. K.M, "Clustering of Ears based on Similarity Metrics for Personal Identification," International Journal of Applied Engineering Research, vol. 10, no. 12, pp. 30927-30942, 2015.
[31]J. Singthongchai and S. Niwattanakul, "A Method for Measuring Keywords Similarity by Applying Jaccard’s, N-Gram and Vector Space," Lecture Notes on Information Theory, vol. 1, no. 4, December 2013.
[32]L. Velikovich, S. Blair-Goldensohn, K. Hannan and R. McDonald, "The viability of web-derived polarity lexicons," in The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, 2010.