Twitter-based Opinion Mining for Flight Service utilizing Machine Learning Approaches
Department of Information Engineering
University of Padova, Italy
Abstract— Twitter is one of the greatest web-based social networking stages far and wide. Individuals utilize Twitter to share their considerations and view on various topics of intrigue. This sharing of data is these days accessible as large information. This enormous information when dissected utilizing machine learning methods can take a type of recommender frameworks which can manage or draw in other individuals to pick the best alternative according to their advantage. In this paper, we propose a framework that plays out the classification of passengers tweets of the airplane service by methods for an opinion investigation. Random Forest and Logistic Regression have been used to classify dataset. The assessment of the classification on test data demonstrates that the proposed framework performs better contrasted with a predefined benchmark: if a passenger tweet is classified positive, neutral or negative the classification is right with a likelihood of around 80%.
Keywords— sentiment analysis, Random Forest, Logistic Regression
These days, Companies are investing more energy than before to convey the clients near them to enhance consumer loyalty and additionally to enhance the income, efficiency, and development, by utilizing constant interchanges, for example, texting, associating through small-scale blogging destinations. Clients utilize these methods for correspondence to express their perspectives and suppositions. Thus, in the time of the associated world, these perspectives and conclusions are imperative as clients can impart their insight on accessible plenty of small-scale blogging destinations, which would directly affect the brand estimations of the organizations. The author Kusen et al. have demonstrated that the immediate relationship between’s the champ of 2016 Austrian presidential races and his ubiquity and impact on Twitter than his rival 3.
Hence, companies use sentiment mining and extract useful information to automatically analyze the customer feedback. By using the extracted patterns companies try to find the polarity of the opinion, i.e., whether the opinion is positive or negative, or the customer emotion, i.e., happy, excited or sad. Companies use these opinion polarity and sentiment recognition to understand the overall sentiments of the users to improve customer services.
In the past, there have been efforts to Correlated Topics Models (CTM) with Vibrational Expectation-Maximization (VEM) algorithm but hierarchical clustering seems to be promising as it would help to search for logarithm. So, we used BIRCH clustering approach to cluster the elements in the dataset; and the use association rule mining to find the relation among the clustered elements. We used US Airlines and the passenger tweets for the evaluation of the proposed sentiment analysis.
The sentiment is another name for the view and assessment that is held or communicated. The sentiment may depict sentiment euphoria, bliss, bitterness or here and there outrage. Furthermore, this is what travelers tweet about on Twitter. Each trip on carriers can bring either delight or uneasiness amid movement for any traveler. In the event that the explorer isn’t content with the administrations, his tweet shows a sentiment of distress. On the off chance that he is very much happy with the administrations, he will demonstrate a sentiment joy in his tweet. Figure 1 depicts to a furious tween by a voyager on British Airways. The British aviation routes additionally considered it important and settled the issues of the particular explorer. Then again, Figure 2 demonstrates a positive tweet about the Indigo carriers.
One solution is to use these tweets to understand the problems of the travelers during the journey and improve it by the time. But, around more than millions of people are traveling on the flights every day and tweets on twitter. It is hard to filter tweets about a particular flight and time as sometimes tweets are very general in nature. Therefore, the idea is to analyses all the tweets about any airlines and tries to understand the behavior of travelers.
Another problem is that the size and number of tweets are so large. Therefore, we need a technique which is efficient enough to deal with large datasets. Machine learning is the name of such a powerful technique which is efficient in handling large dimensional data. Machine learning can be defined as a set of techniques that can be used to extract hidden and meaningful information from large datasets.
Figure 1: A tweet representing discomfort during travel 1
Figure 2: A tweet representing discomfort during travel 2
Machine learning is playing a great inning in transportation 4,
5, 6, bio-informatics 7, computer vision 8, social media 9 and healthcare analytics 10. Therefore, the power of machine learning is world known. In this study, we are using the Twitter data set based on airline industries and analyzing in using machine learning techniques. The rest of the paper is organized as follows: in section 2, a technical background is described. Section 3 describes the description of the dataset. Section 4 discussed results obtained and the relevant discussion which is followed by a conclusion.
II. Technical Backgrounds
In this section, we discuss the technical background which is required to understand the work. Random Forest, Logistic Regression has been used in this analysis and PEM used as a measure of precision, f-measure, support, recall.
A. Random Forest
Random forest could be a bundle learning procedure for regression, classification, and elective undertakings, that direct by building a large group of decision trees at training length and provoking the classification which might be the strategy for the mean prediction (regression) or classifications (classes) of the distinctive trees. Unpredictable call backward sensible for decision trees’ standard of overfitting to their training set. in different estimations, the classifications are dead recursively until the point when the moment that each and each leaf is spotless or unadulterated, that is the demand of the data ought to be as flawless as would be reasonable. The goal is continuously theory of a decision tree till it snatches modify of limit and accuracy. This technique used the ‘Entropy’ that is the figuring of turmoil information. Here Entropy X ? is estimated by:
Entropy () = – …………………….. (1)
Entropy () =……………………………. (2)
Total Gain = Entropy () -Entropy () …………….. (3)
Here the goal is to increase the total gain by dividing total entropy because of diverging arguments by value i.
B. Logistic Regression
Logistic Regression is a factual technique for investigating a dataset in which there are at least one or more than one independent variables that decide a result. The result is estimated with a dichotomous variable (in which there are just two conceivable results).
The objective of logistic regression is to locate the best fitting (yet naturally sensible) model to portray the connection between the dichotomous normal for intrigue and arrangement of independent factors. Logistic regression produces the coefficients of an equation to anticipate a logit change of the likelihood of the presence of the factors of interests.
Odds= =………………. (5)
As rather to picking parameters that limit the aggregate of squared errors, evaluation in logistic regression picks parameters that boost the probability of distinguishing the sample esteems.
C. Performance Evaluation Measures (PEM)
In order to measure the accuracy of classification 4, we used different parameters such as Recall, Precision, and F-measure. Recall can be defined as the measure of completeness whereas Precision can be defined as a measure of exactness. More specifically, precision can be defined as the ratio of correctly classified instances of one class and a total number of instances classified in the same class. On the other hand, Recall is the ratio of correctly classified instances of one class and overall instances of the same class. Both precision and Recall can be calculated using the contingency table. Contingency table or confusion matrix represents the number of correctly classified and incorrectly classified instances of all classes. Using contingency table, all performance evaluation measures can be calculated. For a twitter dataset with a two-class classification problem, if the total 600 number of tweets are classified to one class and 500 of them are a correct and total number of tweets in this class are 700. Then, the precision of the classifier will be 500/600 that equals to 83.3%. Similarly, Recall of the classifier will be 500/700 that equals to 71.4%. Both Recall and Precision are integrated to develop a new measure known as F-measure of F-score. The formula to calculate F-measure is given in Equation 7.
F-measure= 2( )…………………………… (7)
III. Result and Discussion
Description of Dataset
This dataset is about US Airlines and the passenger tweets. There were several features available but we selected some features and especially the text features where tweets available to passengers. There are some features like airline sentiments (positive, negative, neutral), airlines, negative reasons. As it can be seen from Figure 3 that classes labels are unbalanced.
Figure 3: Distribution of Class Labels
Pre-processing of Text
Some basic computation has been done to analyze the text variables and get inside in the text dataset like how many words are in the tweet as it is represented by word_counts, how many tags words are available in the tweets as it is represented by hashtag_counts, uppercase words by capitalword_counts, other account by mention_counts, question marks by questionmark_counts, url_counts and emoji_counts.
Distribution of those text variables is depicted in Figure 5. In the below text variable analysis, how text variables are related to the class variables depicted in Figure 4.
During text pre-processing, several text variables like hashtags, mentions, URLs etc. have been removed to make text data more clean for further analysis. Further removed punctuations, stop words, stem words, digits, stop words and set words to lowercase for analysis.
After text pre-processing, text twits look like this in Figure .4
Figure 4: Top 10 Sample of cleaned Data
Figure 5: Distribution of Text variables
Now, preprocessed text data has several frequent words which are useful for further analysis and can give some idea. Top 30 frequent words have been shown in Figure 6.
Figure 6: Top 30 frequent words
A. Formation of Test Data
For further evaluation of model which will be trained, it is necessary to have test data that could be helpful to evaluate several measures of our model. Text count variable has been combined with cleaned data to create a data frame.
B. Performing cross-validation
For opting better parameters, it is needed to assess on a different validation set which was not utilized amid the training. Be that as it may, utilizing just a single validation-set might not deliver dependable validation output. Because of the chance, it might have a decent model execution on the validation set. On the off chance that you would part the information else, you may wind up with different outcomes. To get a more precise estimation, cross-validation has been performed.
Data was chopped into k-fold validation set and train set. GridSearchCV utilizes the default scorer to evaluate the score. This metric scoring scheme would be the precision for logistic regression and Random Forest classifiers. Words have been transformed into numbers for using into both classifiers and this has been done using CountVectorizer. Now, this bag of words can be utilized as an input in both classifiers.
C. Word Cloud of Positive and Negative Opinion
word cloud gives a decent visual portrayal of the word recurrence for each kind of opinion in which left one are positive and right one is negative. The span of the word relates to its recurrence across all tweets. We can have a thought of what passenger are discussing. For instance, for negative opinion, passengers appear to gripe about delayed of flight, cancellation of flights, service seems bad for that flight, hours holding up and etc. Be that as it may, for positive opinion, passengers are thankful and they discuss extraordinary administration/flight. A cloud of the word has been mentioned in Figure 7 to visualize those positive and negative tweets more properly.
Figure 7: Word Cloud of Positive and Negative Opinion
D. Logistic Regression
After applying logistic regression on dataset, these measures have been obtained which can be seen in table no. 1
E. Random Forest
After applying Random Forest on dataset, these measures have been obtained which can be seen in table no. 2
As from above tables, it can be seen that both classifiers performed very well and provided better result but Random Forest worked better as compared to logistic regression. Precision, Recall, and F-score is better in random forest classifier. 82 % test accuracy is superior to anything that we would accomplish by setting the prediction for all perceptions to the dominant part class. Precision is quite good for all the three classes and recall is low for the neutral classes.
For testing purpose, some new negative and positive tweets taken like that to check model.
” @united It’s a shame choosing #United may be the difference between reuniting with aging friends and never seeing them again #PoorService”,
“@united Big thanks to Ms. Winston for assisting me over the phone with a baggage claim issue today. She really went the extra mile!”
“@united flight attendant doesn‚Äôt understand not understanding English doesn‚Äôt mean they are deaf. Stop yelling English slowly at them”
“@United THANK U! Secured room for the night Thx to VERY helpful customer service rep N. Dorns. I thanked her. Can u 2? #goodenoughmother”
Model-predicted accurately like Negative, Negative and Positive, Positive for negative and positive tweets.
IV. Conclusion and Future Works
In this study, firstly taste data was created then k-fold cross validation has been performed on. Data was chopped into k-fold validation set and train set. GridSearchCV utilizes the default scorer to evaluate the score. This metric scoring scheme would be the precision for logistic regression and Random Forest classifiers. Words have been transformed into numbers for using into both classifiers and this has been done using CountVectorizer. Now, this bag of words can be utilized as an input in both classifiers. Cloud word also provides better visualization about frequent positive and negative words which is quite meaningful in the sense of understanding passenger’s opinion. It can be seen that both classifiers performed very well and provided better result but Random Forest worked better as compared to logistic regression. Precision, Recall, and F-score is better in random forest classifier. 82 % test accuracy is superior to anything that we would accomplish by setting the prediction for all perceptions to the dominant part class. Precision is quite good for all the three classes and recall is low for the neutral classes. This model predicted correctly and gave better accuracy when applied some new positive and negative tweets. For future work, there is a possibility to go more inside dataset and can retrieve more information which could be useful for several airplane organization and passengers.
“This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No 721321”.
1 http://business.time.com/2013/09/03/man-spends-more-than-1000-to-call-out-british-airways-on-twitter/ accessed on 16.12.2017 at 1115 hours.
2 http://www.thegenxtimes.com/trends/man-tweets-indigo-about-misplaced-luggage-glad-to-hear-that-replies-indigo/ accessed on 16.12.2017 at 1124 hours.
3 Kušen, Ema, and Mark Strembeck. “An Analysis of the Twitter Discussion on the 2016 Austrian Presidential Elections.” arXiv preprint arXiv:1707.09939 (2017).
4 Gräbner, Dietmar, et al. Classification of customer reviews based on sentiment analysis. no, 2012.
5 Zhang, Libo, Yihan Sun, and Tiejian Luo. “A framework for evaluating customer satisfaction.” Software, Knowledge, Information Management & Applications (SKIMA), 2016 10th International Conference on. IEEE, 2016
6 Han, J. and Kamber, M.P., Data Mining: Concepts and Techniques, Morgan Kaufmann Publishers Inc. San Francisco, CA, USA, 2005 ISBN:1558609016.
7 Misopoulos, Fotis, et al. “Uncovering customer service experiences with Twitter: the case of the airline industry.” Management Decision 52.4 (2014): 705-723.
8 Bollen, Johan, Huina Mao, and Alberto Pepe. “Determining the Public Mood State by Analysis of Microblogging Posts.” ALIFE. 2010.
9 Tiwari P., Yadav P., Kumar S., Mishra, B.K., Nguyen G.N., Gochhayat S.P., “Sentiment Analysis for Airlines service based on twitter dataset” accepted in Social Network Analytics, Elsevier Inc.
10 Kumar S, Tiwari P, Ovchinkin A, Nezhurina M, “Twitter-based Analysis of Airlines services using Machine Learning” ICACCE 2018.
11 Tiwari, Prayag, et al. “Implementation of n-gram methodology for rotten tomatoes review dataset sentiment analysis.” International Journal of Knowledge Discovery in Bioinformatics (IJKDB) 7.1 (2017): 30-41.
12 Efron, Miles. “Information search and retrieval in microblogs.” Journal of the Association for Information Science and Technology 62.6 (2011): 996-1008.
13 Tiwari, Prayag, and Denis Kalitin. “A Conjoint Analysis of Road Accident Data using K-modes Clustering and sayesian Networks (Road Accident Analysis using clustering and classification).”
14 Tiwari, Prayag, Sachin Kumar, and Denis Kalitin. “Road-User Specific Analysis of Traffic Accident Using Data Mining Techniques.” International Conference on Computational Intelligence, Communications, and Business Analytics. Springer, Singapore, 2017.
15 Tiwari, Prayag, Huy Dao, and Gia Nhu Nguyen. “Performance Evaluation of Lazy, Decision Tree Classifier and Multilayer Perceptron on Traffic Accident Analysis.” Informatica 41.1 (2017).
16 Kumar, Sachin, Prayag Tiwari, and Kalitin Vladimirovich Denis. “Augmenting Classifiers Performance through Clustering: A Comparative Study on Road Accident Data.” International Journal of Information Retrieval Research (IJIRR) 8.1 (2018): 57-68.
17 Tiwari, Prayag. “Accident Analysis by using Data Mining Techniques.” (2017).
18 Tiwari, Prayag. “Comparative Analysis of Big Data.”
19 Tiwari, P., A. C. Mishra, and A. K. Jha. “Case Study as a Method for Scope Definition.” Arabian J Bus Manag Review S 1.002 (2016).
20 Yee Liau, Bee, and Pei Pei Tan. “Gaining customer knowledge in low cost airlines through text mining.” Industrial Management & Data Systems 114.9 (2014): 1344-1359.
21 Tiwari, Prayag. “Advanced ETL (AETL) by integration of PERL and scripting method.” Inventive Computation Technologies (ICICT), International Conference on. Vol. 3. IEEE, 2016.
22 Tiwari, Prayag. “Improvement of ETL through integration of query cache and scripting method.” Data Science and Engineering (ICDSE), 2016 International Conference on. IEEE, 2016.
23 Tiwari, Prayag, et al. “Improved performance of data warehouse.” Inventive Communication and Computational Technologies (ICICCT), 2017 International Conference on. IEEE, 2017.
24 Tiwari P., Nguyen G.N., Kumar S., Yadav P., Ashour A.S., Dey N., Sentiment Analysis based on Russian and English Review Dataset, Statistical Analysis and Data Mining: The ASA Data Science Journal.
25 Fotis Misopoulos, Miljana Mitic, Alexandros Kapoulas, Christos Karapiperis, (2014) “Uncovering customer service experiences with Twitter: the case of airline industry”, Management Decision, Vol. 52 Issue: 4, pp.705-723, https://doi.org/10.1108/MD-03-2012-0235.