Disclaimer - The views, thoughts, and opinions expressed in the text belong solely to the author, and should not in any way be attributed to the author’s employer, or to the author as a representative, officer or employee of any organisation.
This article is an excerpt from my book “Practical Data Analysis: Using Open Source Tools & Techniques” (available on Amazon worldwide, iBook Store, and Barnes & Noble).
One of the earliest expressions of public opinion was rebellion. Peasants rebelled against oppressive regimes all throughout history. When the king saw his subjects in open rebellion, it was a pretty clear sign that the public support for his rule was eroding. Unpaid tax was another clue; when rulers saw their tax receipts dwindle and heard reports of tax collectors being killed, they knew that public opinion was turning against them. With the passing of time however, both the rulers and the ruled have learned of better ways to express their views and opinions - from freely held elections to participation in legislative activities, using media and communication, and non-violent rallies, protests and demonstrations.
By the turn of the 21st century, something even more phenomenal happened - the global Internet revolution exploded. From MySpace and LinkedIn to Facebook, Twitter, Instagram, Snapchat and WhatsApp, people found new and innovative ways of communicating and sharing with each other, around the clock and across great distances. The emergence of these social networking platforms and their ever-increasing user base has not only transformed forever the way society works, but also the nature and content of what people share and discuss. Today, whether you are the President of a nation or the organiser of a popular spring revolution, a celebrity or a common man, you can tweet your story and share your opinion about anything to everything, and have it instantly reach millions of people across the world, uncensored and unprejudiced. The ripple effect of this social media led transformation has now spread so far and wide that we are ever more opinionated as individuals and as a society than perhaps in any other time in our history.
In not so distant past, due to the shortage of data, door to door surveys and opinion polls were the only real means to gauge the sentiments of the general public towards particular brands, goods, political views or ideologies. In this new information age, where thoughts and opinions are shared so prolifically, and where peer advice and recommendations are in plentiful supply, paradoxically, the challenge is no longer the lack of data on public opinion, but how to make sense from too much of it. The sudden eruption of activity in the area of opinion mining and sentiment analysis, has thus occurred as a response to the surge of interest in automated information-gathering systems that can answer the question - what do the general public think?
Sentiment analysis or opinion mining is a recently developed sub-branch of the study of Natural Language Processing (NLP) techniques, and covers the computational analysis of people’s individual or collective opinions and emotions towards particular brands, goods, political views or ideologies, with the objective of finding answers to some very pertinent questions such as:
- How did the market react to the increase in interest rate?
- What do people think about the individuals we do business with?
- How did the general public react to the changes in the tax law?
- What do the general public think about the election debate?
- Why has the sale of a product declined?
- Which features of a product do people like or dislike most?
- Which food retailers do people prefer?
The ability to ask and almost instantly get an answer (with a degree of certainty) to questions such as the ones above is extremely beneficial in many critical decision-making tasks, and in a variety of domains, such as poll forecasting, marketing, stock market trading and continuous credit risk assessment. However, sentiment analysis and opinion mining is still an area of active research, and as with any emerging technology, it is prone to some degree of error. Take for example the following sentence –
“The train is late yet again ….. brilliant …..”
Most humans would be able to quickly interpret that the individual was being sarcastic with the use of the word “brilliant”. Without contextual understanding, a sentiment analysis tool on the other hand may see the word “brilliant” and incorrectly classify the sentence as expressing a positive sentiment.
Maybe one day, as machine learning algorithms evolve, sentiment analysis tools will become proficient in understanding the linguistic expressions of irony or sarcasm. But for now, we will have to live with the capabilities as well as the limitations of the existing sentiment analysis tools and techniques. Hence, if your use case requires a 100% accuracy (in being able to correctly classify a sentiment or an opinion), the currently available sentiment analysis and opinion mining techniques may not be suitable for you. On the other hand, if you are more than happy with an accuracy rate of 70% to 90% (or even higher for certain domains, such as brief social media posts on Twitter and Facebook), you are in luck!
Unfortunately, there is not enough space in a post like this to write about what these amazing technologies. Nevertheless, if you are curious about how you can use open source technology to gauge people’s sentiments and opinions based on what they are tweeting or posting, I have written a chapter on “Sentiment Analysis and Named Entity Recognition” in my book. If you want to go even further and build an enterprise grade platform (fully open source) that can stream tweets and chat messages in real-time, and give you a dashboard view of how people’s sentiments and opinions towards specific brands or topics are evolving over time, I will highly recommend reading chapter nine of my book on “Continuous Monitoring and Real-Time Analytics”.
Dhiraj Bhuyan, 30 July 2018
PS: If you enjoyed reading this post, you may like the following two articles as well.




