Ensemble Approach for Twitter Sentiment Analysis

Full Text (PDF, 423KB), PP.20-26

Views: 0 Downloads: 0

Author(s)

Dimple Tiwari 1,* Nanhay Singh 1

1. Ambedkar Institute of Advanced Communication Technologies and Research Govt. of NCT of Delhi, India

* Corresponding author.

DOI: https://doi.org/10.5815/ijitcs.2019.08.03

Received: 30 Mar. 2019 / Revised: 11 May 2019 / Accepted: 7 Jun. 2019 / Published: 8 Aug. 2019

Index Terms

Stemming, Lemmatization, Tokenization, Naive Bayes, Decision Tree, KNeighbors, AdaBoost, ExtraTree classifier and sentiment analysis

Abstract

Due to enlargement of social network and online marketing websites. The Blogs and reviews of the user are acquired from these websites. And these become useful for analysis and Decision making for various types of products, marketing and movie etc. with the extent of the usefulness of social Reviews. It is to be needed carefully analysis of that data. There are various techniques and methods are available that can accurately analyses the social information and provides greater accuracy for the analysis. But one of the major issues available with the social media data is that data is unstructured and noisy. It is to be required to solve this problem. So here in this paper a framework is proposed that includes latest data preprocessing techniques instead of noise removal like stemming, Lemmatization and Tokenization. After Pre-Processing of data ensemble methods is applied that increase the accuracy of previous classification algorithms. This method is inherent from bagging concept. First apply Decision Tree, Kneighbor and Naive Bayes classifier that not provide batter accuracy after that boosting concept is applied with the help of AdaBoost method that improves the accuracy of previous classical classifiers. At last our proposed ensemble method ExtraTree classifier is applied that inherent from bagging concept. Here we use the Extra Tree classifier that take the various sample are taken from training set and various random trees are created. It is also called as extremely randomized tree that provides extreme refined view. So that, it is to be conveying that The ExtraTree classifier of bagging ensemble method outperforms than all other techniques that are previously applied in this paper. with using some novel pre-processing techniques data that produced is more refined and that provides clean and pure base for the implementation of ensemble techniques. And also contributes in improving the accuracy of the applied methods.

Cite This Paper

Dimple Tiwari, Nanhay Singh, "Ensemble Approach for Twitter Sentiment Analysis", International Journal of Information Technology and Computer Science(IJITCS), Vol.11, No.8, pp.20-26, 2019. DOI:10.5815/ijitcs.2019.08.03

Reference

[1]Ortigosa A, Martín JM, Carro RM. Sentiment analysis in Facebook and its application to e-learning. Computers in human behavior. 2014 Feb 1;31:527-41.

[2]Perikos I, Hatzilygeroudis I. Aspect based sentiment analysis in social media with classifier ensembles. In2017 IEEE/ACIS 16th International Conference on Computer and Information Science (ICIS) 2017 May 24 (pp. 273-278). IEEE.

[3]Ghiassi M, Skinner J, Zimbra D. Twitter brand sentiment analysis: A hybrid system using n-gram analysis and dynamic artificial neural network. Expert Systems with applications. 2013 Nov 15;40(16):6266-82.

[4]Zainuddin N, Selamat A, Ibrahim R. Hybrid sentiment classification on twitter aspect-based sentiment analysis. Applied Intelligence. 2018 May 1:1-5.

[5]Gautam G, Yadav D. Sentiment analysis of twitter data using machine learning approaches and semantic analysis. In2014 Seventh International Conference on Contemporary Computing (IC3) 2014 Aug 7 (pp. 437-442). IEEE.

[6]Hastie T, Tibshirani R, Friedman J. Additive models, trees, and related methods. InThe elements of statistical learning 2009 (pp. 295-336). Springer, New York, NY.

[7]Vijayarani S, Janani MR. Text mining: open source tokenization tools-an analysis. Advanced Computational Intelligence: An International Journal (ACII). 2016 Jan;3(1):37-47.

[8]Sabariah MK, Effendy V. Sentiment analysis on Twitter using the combination of lexicon-based and support vector machine for assessing the performance of a television program. In2015 3rd International Conference on Information and Communication Technology (ICoICT) 2015 May 27 (pp. 386-390). IEEE.

[9]Ali K, Dong H, Bouguettaya A, Erradi A, Hadjidj R. Sentiment analysis as a service: a social media based sentiment analysis framework. In2017 IEEE International Conference on Web Services (ICWS) 2017 Jun 25 (pp. 660-667). IEEE.

[10]Wang H, Meghawat A, Morency LP, Xing EP. Select-additive learning: Improving generalization in multimodal sentiment analysis. In2017 IEEE International Conference on Multimedia and Expo (ICME) 2017 Jul 10 (pp. 949-954). IEEE.

[11]Poria S, Cambria E, Howard N, Huang GB, Hussain A. Fusing audio, visual and textual clues for sentiment analysis from multimodal content. Neurocomputing. 2016 Jan 22;174:50-9.

[12]Poria S, Chaturvedi I, Cambria E, Hussain A. Convolutional MKL based multimodal emotion recognition and sentiment analysis. In2016 IEEE 16th international conference on data mining (ICDM) 2016 Dec 12 (pp. 439-448). IEEE.

[13]Araujo M, Reis J, Pereira A, Benevenuto F. An evaluation of machine translation for multilingual sentence-level sentiment analysis. InProceedings of the 31st Annual ACM Symposium on Applied Computing 2016 Apr 4 (pp. 1140-1145). ACM.

[14]Shahare FF. Sentiment analysis for the news data based on the social media. In2017 International Conference on Intelligent Computing and Control Systems (ICICCS) 2017 Jun 15 (pp. 1365-1370). IEEE.

[15]Severyn A, Moschitti A. Twitter sentiment analysis with deep convolutional neural networks. InProceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval 2015 Aug 9 (pp. 959-962). ACM.

[16]Karanasou M, Ampla A, Doulkeridis C, Halkidi M. Scalable and real-time sentiment analysis of twitter data. In2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW) 2016 Dec 12 (pp. 944-951). IEEE.

[17]Bindal N, Chatterjee N. A Two-Step Method for Sentiment Analysis of Tweets. In2016 International Conference on Information Technology (ICIT) 2016 Dec 22 (pp. 218-224). IEEE.

[18]Mensikova A, Mattmann CA. Ensemble sentiment analysis to identify human trafficking in web data.

[19]Amolik A, Jivane N, Bhandari M, Venkatesan M. Twitter sentiment analysis of movie reviews using machine learning techniques. international Journal of Engineering and Technology. 2016;7(6):1-7.

[20]Kanakaraj M, Guddeti RM. Performance analysis of Ensemble methods on Twitter sentiment analysis using NLP techniques. InProceedings of the 2015 IEEE 9th International Conference on Semantic Computing (IEEE ICSC 2015) 2015 Feb 7 (pp. 169-170). IEEE.

[21]Chikersal P, Poria S, Cambria E. SeNTU: sentiment analysis of tweets by combining a rule-based classifier with supervised learning. InProceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015) 2015 (pp. 647-651).

[22]Satapathy R, Guerreiro C, Chaturvedi I, Cambria E. Phonetic-based microtext normalization for twitter sentiment analysis. In2017 IEEE International Conference on Data Mining Workshops (ICDMW) 2017 Nov 18 (pp. 407-413). IEEE.

[23]Bifet A, Frank E. Sentiment knowledge discovery in twitter streaming data. InInternational conference on discovery science 2010 Oct 6 (pp. 1-15). Springer, Berlin, Heidelberg.

[24]Patil TR, Sherekar SS. Performance analysis of Naive Bayes and J48 classification algorithm for data classification. International journal of computer science and applications. 2013 Apr;6(2):256-61.

[25]Gupta R, Jivani AG. Analyzing the Stemming Paradigm. InInternational Conference on Information and Communication Technology for Intelligent Systems 2017 Mar 25 (pp. 333-342). Springer, Cham.

[26]Aker A, Petrak J, Sabbah F. An extensible multilingual open source lemmatizer. InProceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017 2017 Nov 17 (pp. 40-45). ACL.

[27]Straka M, Hajic J, Straková J. UDPipe: Trainable Pipeline for Processing CoNLL-U Files Performing Tokenization, Morphological Analysis, POS Tagging and Parsing. InLREC 2016 May.