
Sentiment Analysis is the process of computationally determining whether a piece of content is positive, negative or neutral. It is also known as Opinion Mining.
Sentiment Analysis is mainly used to gauge the views of public regarding any action, event, person, policy or product. It has become a very potent weapon even for politicians to assess the public reaction over their statements (also consider checking out this perfect parcel of information for data science degree). These days Opinion Mining has reached an advanced stage where several outcomes can be predicted using large datasets and machine learning etc.
However, in this post, we will restrict ourselves to extracting 1000 tweets about Narendra Modi, Prime Minister of India and do a sentiment analysis and calculate the percentages of Positive, Negative and Neutral Views.
Tweepy is an easy to use Python library for accessing the Twitter API
We will be using Tweepy to extract tweets from Twitter Stream.
You can install tweepy using the command
pip install tweepy
TextBlob is a Python (2 and 3) library for processing textual data. It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more.
You can install textblob using the command
pip install textblob
You will also need to the Natural Language ToolKit.
python -m textblob.download_corpora
To extract tweets from Twitter Stream using API, we first need to register an App with Twitter. Go to TwitterApps and click on New App after signing up.
You can leave the Callback URL empty. Agree to the Developer Conditions and select Create App.
We need the Secret Keys and Access Tokens for the API to work (also consider checking out this career guide for data science jobs). Please Click on “Keys and Access Tokens” Tab. You will find Consumer Key and Consumer Secret. Note them down.
Now, we need to create Access Tokens for our Account. Click on “Create my access token”
And then note down the “Access Token” and “Access Token Secret”
Now we are ready to retrieve tweets from Twitter Stream. Let us extract a few tweets using the above details
Let us write a class TwitterClient for extracting Tweets. We will initialize our connection in the constructor using the code
Run the program at the terminal – you will be finding 10 tweets that contain “Narendra Modi”
However, if you observe, the tweets contain external links, a lot of white spaces etc. For our Opinion Mining, we do not need all of it. Let us remove all external links, special characters using a regular expression. You can read more about regular expressions here
Let us add one function trim_text which handles the cleaning up of the tweet.
def trim_tweet(self, tweet):
Now instead of directly assigning into parsed_tweet, we will assign
parsed_tweet['text'] = self.trim_tweet(tweet.text)
Run the program again, you will find that Tweets are trimmed and shortened.
Now let us proceed to the most important part of the exercise – Opinion Mining using TextBlob
When we create a TextBlob object, the following processing happens:
Let me explain a bit more about how the Sentiment Classifier works:
Now that we understand the modus operandi of Opinion Mining, let us write a function get_tweet_sentiment
def get_tweet_sentiment(self, tweet):
# create TextBlob object of passed tweet text
analysis = TextBlob(self.trim_tweet(tweet))
# set sentiment
if analysis.sentiment.polarity > 0:
return 'positive'
elif analysis.sentiment.polarity == 0:
return 'neutral'
else:
return 'negative'
You should get output something similar to this.
Add Comment