It is not uncommon for us to consider what other people think in our decision-making process. Prior to the advent of the Internet, many of us relied on friends and families for product or service recommendations, or information when buying a product. The Internet eases our efforts to get opinions of the general population.
In a world where colossal amounts of user-generated content is produced every day, it is practically impossible for human workforce to collect all the data and determine the opinions expressed in those data. Therefore, there arises a need to develop computer algorithms to automate the classification of reviews on the basis of their polarities as: positive, negative or neutral.
What kind of questions are answered by Sentiment Analysis?
- Is a given product review positive or negative?
- Is a customer satisfied or dissatisfied based on his email response?
- Based on a sample of tweets, how are people responding to a given ad campaign, product release, or news item?
- How have bloggers' attitudes about the president changed since the election?
What are the steps involved in Sentiment Analysis?
- Data acquisition: The collection of data is an important phase since a proper dataset needs to be defined for analyzing and classifying the text in the dataset.
- Text preprocessing: After collecting the data, preprocessing allows to reduce noise in data. This is done by removing the unnecessary stop words, repeated words, stemming, removal of emoticons, removal of URLs etc.
- Feature selection and extraction: Proper selection and extraction of features plays a key role in determining the accuracy of the model. Hence, the appropriate feature extraction technique must be chosen for extracting the features.
- Sentiment classification: In this phase, various sentiment classification techniques are applied to classify the text. Some popular sentiment classification techniques are Naïve Bayes (NB) and Support Vector Machines(SVM).
- Polarity detection: After classifying the sentiments, the polarity of the sentiment is determined. The goal of polarity detection is to decide whether a text expresses positive, negative or neutral sentiment.
- Validation and evaluation: Finally, validation and evaluation of the obtained results is performed so as to determine the overall accuracy of the techniques used for sentiment analysis.
What are the various approaches for sentiment analysis?
Broadly, there are ML-based and lexicon-based approaches, plus a hybrid approach that's a combination of these two.
Lexicon models make use of sentiment lexicons, which are collections of annotated and preprocessed sentiment terms. Sentiment values are assigned to words that describe the positive, negative and neutral attitude of the speaker. It is further classified as:
- Dictionary-based Method: It uses a small set of seed words and an online dictionary. The strategy here is initial seed set of words with their known orientations are collected and then online dictionaries are searched to find their probable synonyms and antonyms. The sample is classified based on the presence of such signalling sentiment words.
- Corpus-based Method: Uses corpus data to identify sentiment words. Even though it is not as effective as dictionary based scheme, it is helpful in finding the domain and context of specific sentiment words against the corpus data. The algorithm will have access not only to sentiment labels, but also to a context.
What are the advantages and limitations of the Sentiment Analysis approaches?
Machine Learning based
- Advantage: Unlike Lexicon-based, these models can be built for a specific purpose or context.
- Limitation: Obtaining labeled data for training could be difficult or expensive.
- Advantage: No training is required.
- Limitation: Accuracy depends on lexical resources. Finite number of words in lexicons and the assignment of a fixed sentiment orientation and score to words.
- Advantage: It incorporates the best of Machine learning based and Lexicon based approaches.
What are some tools available for Sentiment Analysis at present?
- Python NLTK: A python based tool for text processing, cataloging, tokenization, stopping, tagging, parsing and much more.
- GATE, the General Architecture for Text Engineering: A Java suite of tools used for all sorts of natural language processing tasks, including information extraction in many languages.
- LingPipe: LingPipe is tool kit for processing text using computational linguistics.
- LIWC (Linguistic Inquiry and Word Count): A computerized text analysis tool that reads a given text and counts the percentage of words that reflect different emotions, thinking styles and social concerns.
What are the challenges involved in Sentiment Analysis?
- Named entity recognition: Locating and classifying named entities in text into pre-defined categories such as the names of persons, organizations, locations. Eg: Is 300 Spartans a group of Greeks or a movie?
- Anaphora Resolution: It is the problem of resolving references to earlier or later items in the discourse. Eg: "We watched the movie and went to dinner. It was awful." What does "It" refer to?
- Parsing: This refers to resolving a sentence into its component parts. What is the subject and object of the sentence, which one does the verb and/or adjective actually refer to?
- Rhetorical modes: Typically the analysed posts contain sarcasm, irony, implication, etc, which are particularly difficult to detect.
- Social media website: It is not uncommon to find reviews and opinions containing slang, abbreviations, lack of capitals and poor punctuation, which would make sentiment analysis even more challenging.
- Visual sentiment analysis: Posts often contain a mixture of visual and textual information. The sentiment polarities implied by texts may contradict the sentiments of images, which poses a challenge for textual sentiment analysis.
Where is Sentiment Analysis being used at present?
A very broad answer to this can be broken up into three categories:
- Brand Monitoring - Sentiment Analysis is used to gauge how a brand, product or company has been received by the public. In fact, private companies like Unamo offer this as a service.
- Customer Service - Customer service agents classify incoming mail into 'urgent' and 'non-urgent', in order to be able to serve the more frustrated customers quicker. The speech analytics platform Callminer Eureka implements AI and ML techniques to draw insight from consumer interactions, in order to offer quality customer service.
- Market Research and Analysis - Opinion mining plays a crucial role in business intelligence, by helping analysts understand why a particular product was well received or not. Stock markets and hedge funds have been known to shift with the shift in sentiments on social media.
Apart from the above, Sentiment analysis is used in the areas of political science, sociology, psychology; flame detection, identifying child-suitability of videos, bias identification in news sources are the variety of applications.
Hatzivassiloglou and McKeown use the term semantic orientation in a paper titled Predicting the Semantic Orientation of Adjectives. Their approach is corpus-based and adapts to new domains. Thus it can tell that 'bull' and 'bear' are opposites in stock market reports. Using only adjectives, their model achieves 90% precision.
Pang and Lee apply machine learning to sentiment analysis. They propose a subjectivity detector to pick out subjective sentences. Then they employ text categorization techniques on the subjective sentences. Algorithms used include Naive Bayes and SVM to find minimum cuts in a graph. They claim an accuracy of 86.4% on the NB polarity classifier.
Gruhl et al. conduct one of the first studies to determine if online comments influence the sales figures of a product. They obtain the sales data of books from Amazon.com. From blog mentions and online chatter, they use automated query generation algorithms to predict the rise and fall of sales of certain books. They find that positive comments lead to increased sales.
The shares of Buffet-owned Berkshire Hathaway rises by as much as 2.94% following the Oscars award ceremony. Likewise, there' correlation between Anne Hathaway's movie release dates and stock price increases of Berkshire Hathaway during the period 2008-2010. The reasoning is that automated trading programs are picking up online chatter about 'Hathaway' and applying it to the stock markets. This is an example where sentiment analysis fails to understand context.
Kucuktunc et al. pioneer large-scale sentiment analysis of Yahoo! Answers. They find that answers differ according to the attributes of users, such as best-rated answers have a neutral tone to them. They also identify particular feelings evoked on reading a certain question. These findings begin to be used in advertising and recommendations.
Aleksandr Kogan collects and provides a database containing information of about 87 million Facebook users to Cambridge Analytica. Cambridge Analytica subsequently uses it to make 30 million "psychographic" profiles about voters. In later years, it is alleged that this data was used to influence voter opinion on behalf of politicians who hired them.
- Alessia, D., Fernando Ferri, Patrizia Grifoni, and Tiziana Guzzo. 2015. "Approaches, tools and applications for sentiment analysis implementation." International Journal of Computer Applications, vol. 125, no. 3, pp. 26-33, September. Accessed 2020-08-18.
- Aroomoogan, Kumesh. 2015. "How Quant Traders Use Sentiment To Get An Edge On The Market." Forbes, August 6. Accessed 2020-08-18.
- Asimuzzaman, Md, Pinku Deb Nath, Farah Hossain, Asif Hossain, and Rashedur M. Rahman. 2017. "Sentiment analysis of bangla microblogs using adaptive neuro fuzzy system." 13th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD), IEEE, pp. 1631-1638, July 29-31. doi: 10.1109/FSKD.2017.8393010. Accessed 2020-08-18.
- Bhat, Aditi. 2018. "How Anne Hathaway Once Tricked Data Analytics to Increase Berkshire Hathaway’s Stock Value!" Blog, Manipal ProLearn, May 22.Accessed 2020-08-18.
- Bing. 2018. "Toward a More Intelligent Search: Bing Multi-Perspective Answers." Blog, Bing, February 6. Accessed 2020-08-18.
- Dai, Shuanglu, and Hong Man. 2018. "Integrating Visual and Textual Affective Descriptors for Sentiment Analysis of Social Media Posts." In 2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR), IEEE, April 10-12. Accessed 2020-08-18.
- Grimes, Seth. 2012. "What are the most powerful open-source sentiment-analysis tools?" Breakthrough Analysis, January 8. Accessed 2020-08-18.
- Gruhl, D., R. Guha, Ravi Kumar, Jasmine Novak, and Andrew Tomkins. 2005. "The predictive power of online chatter." KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, pp. 78-87, August. doi: 10.1145/1081870.1081883. Accessed 2020-08-18.
- Guevara, Juan, Joana Costa, Jorge Arroba, and Catarina Silva. 2018. "Harvesting opinions in Twitter for sentiment analysis." 13th Iberian Conference on Information Systems and Technologies (CISTI), IEEE, June 13-16. doi: 10.23919/CISTI.2018.8399226. Accessed 2020-08-18.
- Hatzivassiloglou, Vasileios, and Kathleen R. McKeown. 1997. "Predicting the Semantic Orientation of Adjectives." 35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics, July, pp. 174-181. Accessed 2020-08-18.
- Hoang, Mickel, Oskar Alija Bihorac, and Jacobo Rouces. 2019. "Aspect-Based Sentiment Analysis using BERT." Proceedings of the 22nd Nordic Conference on Computational Linguistics, pp. 187-196, September-October. Accessed 2020-08-18.
- Hyken, Shep. 2017. "AI Is Super-Charging The Customer Service World." Forbes, November 26. Accessed 2020-08-18.
- Katarya, Rahul, and Ashima Yadav. 2018. "A comparative study of genetic algorithm in sentiment analysis." 2nd International Conference on Inventive Systems and Control (ICISC), IEEE, January 19-20. Accessed 2020-08-18.
- Kharde, Vishal, and S.S. Sonawane. 2016. "Sentiment analysis of twitter data: a survey of techniques." International Journal of Computer Application, vol. 139, no. 11, April. Accessed 2020-08-18.
- Kucuktunc, O., B. Barla Cambazoglu, Ingmar Weber, and Hakan Ferhatosmanoglu. 2012. "A large-scale sentiment analysis for Yahoo! Answers, Proceedings of the 5th ACM International Conference on Web Search and Data Mining." WSDM'12, ACM, pp. 633-642, February 8-12. Accessed 2020-08-18.
- Lumen Learning. 2020. "Polling the Public." Chapter 6 in: American Government, Lumen Learning. Accessed 2020-08-18.
- Mercer, Ian. 2011. "What are the most challenging issues in Sentiment Analysis(opinion mining)?" StackOverflow, January 26. Accessed 2020-08-18.
- Mirvish, Dan. 2011. "The Hathaway Effect: How Anne Gives Warren Buffett a Rise." The Huffington Post, March 02. Updated 2017-12-06. Accessed 2020-08-18.
- Pang, Bo, and Lillian Lee. 2004. "A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts." Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04), pp. 271-278, July. Accessed 2020-08-18.
- Paxcom. 2016. "Why is Customer Sentiment Analysis Important for Your Brand?" Blog, Paxcom, September 5. Updated 2016-09-08. Accessed 2020-08-18.
- Poria, Soujanya, Erik Cambria, Devamanyu Hazarika, Navonil Majumder, Amir Zadeh, and Louis-Philippe Morency. 2017. "Context-Dependent Sentiment Analysis in User-Generated Videos." Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 873-883, July. Accessed 2020-08-18.
- Roldós, Inés. 2019. "8 of The Best Sentiment Analysis Tools For Businesses." Blog, MonkeyLearn, October 28. Accessed 2020-08-18.
- Sankar, H., and V. Subramaniyaswamy. 2017. "Investigating sentiment analysis using machine learning approach." International Conference on Intelligent Sustainable Systems (ICISS), IEEE, pp. 87-92, December 7-8. doi: 10.1109/ISS1.2017.8389293. Accessed 2020-08-18.
- Shayaa, Shahid, Noor Ismawati Jaafar, Shamshul Bahri, Ainin Sulaiman, Phoong Seuk Wai, Yeong Wai Chung, Arsalan Zahid Piprani, and Mohammed Ali Al-Garadi. 2018. "Sentiment Analysis of Big Data: Methods, Applications, and Open Challenges." IEEE Access, vol. 6, pp. 37807-37827, June 28. Accessed 2020-08-18.
- Sims, Scott. 2015. "Sentiment Analysis 101." KDnuggets, December. Accessed 2020-08-18.
- Sun, Chi, Luyao Huang, and Xipeng Qiu. 2019. "Utilizing BERT for Aspect-Based Sentiment Analysis via Constructing Auxiliary Sentence." arXiv, v1, March 22. Accessed 2020-08-18.
- Tom. 2018. "What is Sentiment Analysis and How to Do It Yourself." Blog, Brand24, February 13. Updated 2020-07-06. Accessed 2020-08-18.
- Unamo. 2020. "Social Media Monitoring." Unamo. Accessed 2020-08-18.
- Vohra, S. M., and J. B. Teraiya. 2013. "A comparative study of sentiment analysis techniques." Journal of information, knowledge and research in computer engineering, vol. 2, no. 2, pp. 315-317. Accessed 2020-08-18.
- Wikipedia. 2020a. "Doxa." Wikipedia, August 15. Accessed 2020-08-18.
- Wikipedia. 2020b. "Sentiment analysis." Wikipedia, June 30. Accessed 2020-08-18.
- Wikipedia. 2020c. "Facebook–Cambridge Analytica data scandal." Wikipedia, August 13. Accessed 2020-08-18.
- Bannister, Kristian. 2015. Understanding Sentiment Analysis: What It Is & Why It’s Used.
- Shinde, P.D. and Rathod, S., 2018. A Comparative Study of Sentiment Analysis Techniques.
- Sentiment Analysis: Concept, Analysis and Applications
- Sentiment Analysis: nearly everything you need to know
- NPTEL - Sentiment Analysis