sentiment analysis on twitter

Sentiment Analysis on Twitter

How to do a Twitter Sentiment Analysis?

Or: What´s the mood on Twitter?

Hello there!

Today I want to show you how to do a so-called Sentiment Analysis. It is about analyzing the mood on Twitter about a certain Keyword. You get a number of tweets which contain a keyword you can define, filter out the text of these tweets and then see if there are more positive or negative words. Of course you can´t just do it by hand; you need a tool doing the work for you.

Our tool:

Our main tool is called R. (yes just R, it´s not a typo)

It is a free “software environment for statistical computing and graphics” and is available for Unix platforms, Windows and MacOS.

It´s available here: http://www.r-project.org/

It has a comfortable installer, so this step shouldn´t be a problem.

After installing you can open the GUI and get the following screen:

1

Ok now we can download our other tool: twitteR

It´s a script written for R.

You don´t have to download it from a website, you can do it directly from within R.

You can to it with:

You then have to select a CRAN mirror, from where you want to download it and click ok. (you can show what ever mirror you want)

R will now download the package and install it.

Then we have to activate it for our current session with:

 

Your screen should look like this now:

2

Ok now we come to a tricky part:

The Twitter Authentification

 

Since Twitter released the Version 1.1 of their API a OAuth handshake is necessary for every request you do. So we have to verify our app.

First we need to create an app at Twitter.

Got to https://dev.twitter.com/ and log in with your Twitter Account.

Now you can see your Profile picture in the upper right corner and a drop-down menu. In this menu you can find “My Applications”.

Click on it and then on “Create new application”.

You can name your Application whatever you want and also set Description on whatever you want. Twitter requires a valid URL for the website, you can just type in http://test.de/ ; you won´t need it anymore.

And just leave the Callback URL blank.

3

Click on Create you´ll get redirected to a screen with all the OAuth setting of your new App. Just leave this window in the background; we´ll need it later

Continue to R and type in the following lines (on separate lines):

You have to replace yourconsumerkey and yourconsumersecret with the data provided on your app page on Twitter, still opened in your webbrowser.

The command twitCred$handshake(cainfo=”cacert.pem”) will ask you to go a certain URL and entert he PIN you receive on this page.

Sentiment Analysis on Twitter:

Ok we passed the authentication and can now go on with getting the tweets we want from Twitter.

Type in:

This makes twitteR get 200 Tweets with the keyword #apple in it (you can change the keyword of course).

After waiting a few seconds you can use length(tweets) to see how many tweets were actually saved; maybe for some keywords the number existing is actual smaller than our sample size n.

4

Now we have our Tweets.

The Analysis:

To be able to analyze our tweets, we have to extract their text and save it into the variable tweets.text by typing:

What we also need are our lists with the positive and the negative words.

We can find them here:

https://github.com/mjhea0/twitter-sentiment-analysis/tree/master/wordbanks

After downloading the ZIP you can put them in a folder on your Computer; you should just keep the absolute path in mind.

We now have to load the words in variables to use them by typing:

Of course you have to change the path, but we have our two lists: pos and neg

Now we have to insert a small algorhytm written by Jeffrey Breen analyzing our words.

Just copy-paste the following lines and hit enter:

 

 

 

The final steps:

Type in:

Congrats, your first sentiment Analysis was now saved.

You can get a table by typing:

Or the mean by typing:

Or get a histogram with:

 

The positive values stand for positive tweets and the negative values for negative tweets. The mean tells you about the overall mood of your sample.

  

 

 

Note:

Sometimes it doesn´t work because there are some tweets with invalid characters in it. Then you have to do the data mining again or change the keyword. As soon an update is available I will update this article.

Julian Hillebrand

During my time at university and learning about the basics of economics I started heavily exploring the possibilities and changes caused by digital disruptions and the process of digital transformation, whereby I focused on the importance of data and data analytics and combination with marketing and management.
My personal focus of interest lies heavily on technology, digital marketing and data analytics. I made early acquaintance with programming and digital technology and never stop being interested in following the newest innovations.

I am an open, communicative and curious person. I enjoy writing, blogging and speaking about technology.

  • Chrisfs

    Nice article! Succint yet useful. Good intro to R

  • Pingback: Create a wordcloud with your Twitter Data | julianhi's Blog()

  • Chrisfs

    I seem to have a problem with the authetication.
    I get the following error in R (using RStudio)
    > twitCred$handshake(cainfo=”cacert.pem”)
    To enable the connection, please direct your web browser to:
    http://api.twitter.com/oauth/authorize?oauth_token=g6ivyRUEckJoGcniqV8yo2pR0qcYwwvsJjF7BibzU
    When complete, record the PIN given to you and provide it here: 3204922
    > registerTwitterOAuth(twitCred)
    [1] TRUE
    > tweets = searchTwitter(“#python”, n=200, cainfo=”cacert.pem”)
    [1] “Unauthorized”
    Error in twInterfaceObj$doAPICall(cmd, params, “GET”, …) :
    Error: Unauthorized

    • Hm have you tried to do it with the “normal” R framework? I´m not a big fan of RStudio cause it sometimes produces strange error messages

  • Chrisfs

    Yup just tried it with the normal R console and get the same error.
    tweets = searchTwitter(“#apple”, n=200, cainfo=”cacert.pem”)
    Error in twInterfaceObj$doAPICall(cmd, params, “GET”, …) :
    OAuth authentication is required with Twitter’s API v1.1

    • Brett

      Hey. I have the exact same problem. Did you manage to get around it okay?

      • Hey! Did you try to execute the steps of the authentication process step by steps? Sometimes R doesn’t wait the needed time.

  • Brett

    I did and i got passed it. My new favourite error that i don’t understand is this…! I’m close to shooting myself right now!!!

    Loading required package: stringr
    Error in FUN(X[[1L]], …) : could not find function “str_split”
    In addition: Warning message:
    In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE, :
    there is no package called ‘stringr’

    • Please try to install the stringr package with install.packages(‘stringr’)

  • Hi, Its great! Thanks!

    i got a problem with non-English letters, now searching how to remove it.
    Could you help?

    • Hey Elizabeth
      Nice to have you on my blog.
      You can use the searchTwitter() function to get tweets containing just a certain language.
      Like:
      tweets = searchTwitter(“iPhone”, n=200, lang=”en”)
      would just give you english tweets.
      Hope I could help you. If you have further questions, i will be happy to answer.

  • Uthra

    Hi ,

    Thanks for the blog. Actually I am trying to do sentiment analysis of telecom operators, but I get for every tweet there is some 15 duplicates. So, if I pull 1500 tweets, there are only 100 unique tweets. How to remove the duplicates in such cases.

    Also, for tweets in languages other than English – is there a way to get them translated in English from twitter or should we do it after saving the tweets.

    • Hey Uthra,
      nice to see you on my blog.
      If everything is working correctly there you shouldn´t receive duplicates. Or better: if you get all the tweets with just one search, the Twitter API does not return duplicates. Please check your code if everything is correct.

      And there is no way to get them translated directly from Twitter. You should save the tweets as you receive them and then think about translating them.

      Please give me an answer if you could find the problem with the duplicates.

      Regards
      Julian

  • Hi

    I am facing a Problem here with Rstudio.

    after Completing the Authentication Process I am trying to get the Tweets but its showing some kind of error.

    Athletics.list Athletics.df = twListToDF(Athletics.list)
    Error in lapply(X = X, FUN = FUN, …) :
    object ‘Athletics.list’ not found
    > write.csv(Athletics.df, file=’C:/temp/AthleticsTweets.csv’, row.names=F)
    Error in is.data.frame(x) : object ‘Athletics.df’ not found

    Help me..

    • Hey Sourabh,
      there seems to be a problem with the lists and dataframes you are using in your code and not with the Twitter Authentication.
      Could you please show me your whole code?

      Regards

  • Hi

    Can we extract data from LinkedIn using R in the same way as we are able to get from TwitteR?

    • Hey Sourabh,
      no you can´t because the LinkedIn API is structured completely than the Twitter API. LinkedIn focuses on contacts and
      there is no way to search LinkedIn for public posts like you could do with Twitter.
      I hope I could help you.

      Regards

  • vishal

    this is really nice article….
    how can you interpret the score which is obtain in analysis?

  • frank
    • Hey Frank,
      thanks for the hint! I will fix it.

      Regards

  • Kevin Desai

    Read all the comments, I still get this error:
    > tweets = searchTwitter(“#apple”, n=200, cainfo=”cacert.pem”)
    [1] “Unauthorized”
    Error in twInterfaceObj$doAPICall(cmd, params, “GET”, …) :
    Error: Unauthorized

    Cant figure out a way. Can you help?

  • I have gone completely bananas over the twitter authentication and PIN generation.

    After executing the following command –
    Cred$handshake(cainfo = system.file(“CurlSSL”,”cacert.pem”,package=”RCurl”))
    OR
    Cred$handshake(cainfo=”cacert.pem”)

    I get this –
    To enable the connection, please direct your web browser to:
    http://api.twitter.com/oauth/authorize?oauth_token=V0W4WSrgKg7s336bMv6o2kCPmunzEToyW2UhnTCCcpM
    When complete, record the PIN given to you and provide it here:

    On redirecting, i get the “Authorize App” page. After that, i get either of the messages –
    “The web page is not available” OR
    “Could not connect to 127.0.0.1:8000/twitter_callback”

    I have tried 4 different Callback URL’s –
    1) 127.0.0.1:8000/twitter_callback
    2) 127.0.0.1:8080/twitter_callback
    3) 127.0.0.1:8000/twitter/oauth
    4) 127.0.0.1:8080/twitter/oauth
    Note – i have even tried the shortened versions of the URL’s mentioned above through Bitly

    I have even tried changing the accessURL and authURL from http to https.

    After struggling with it for days, I still see no signs of moving ahead.
    Please guide me.

    • Hey Nitisch,
      there is no need for providing a callback URL in the app setting. Just leave that field blank as you get redirected to a Twitter page automatically and you just have to copy paste the pin code.
      But I will update the Twitter Auth post in a few minutes as the whole login process got much much more easier in the newest version of the twitteR package.

      Regards

  • Kunal Gupta

    I am getting this error message when I am trying to get the twitCred$handshake(cainfo=”cacert.pem”) ….Please help!!

    “”
    > twitCred$handshake(cainfo=”cacert.pem”)
    To enable the connection, please direct your web browser to:
    http://api.twitter.com/oauth/authorize?oauth_token=L8qPjY55LW2QjzcgSxkxxjqwNpHEIfKL
    When complete, record the PIN given to you and provide it here: 8907806
    Error: Forbidden

    “”

  • DonTheDragon

    Hi Julian. Thanks for the great blog post! I am excited to start analysing Twitter in this way. All was going well, but then I had the following error message when running the Jeffrey Breen algorithm: “Error: ‘\d’ is an unrecognized escape in character string starting “‘\d””. And then the same error for “\s”. Could you please help?

    • Hey
      this is probably an error caused by some kind of character combination in the data you are analyzing. This happens from time to time.
      Where exactly does this error appear?

      Regards

      • Someone

        First off, thanks for the great tutorial! I had the same error. Solution: Escape by adding another backslash: “\\d” and “\\s”.

        Hope this helps!

        • Hey
          thanks for your solution.

          Regards

        • Quantguy

          Thanks a lot!…saved my time

  • Noah

    tweets = searchTwitter(“#apple”, n=200, cainfo=”cacert.pem”)

    Do i get the 200 LAST tweets with the keyword #apple that has been done on twitter or just 200 tweets that has been done at any time in the past on twitter ?

    • Hey
      you get the 200 most recent tweets.

      Regards

  • Gollapinni Karthik

    Hi,

    Hey, the post is really awesome and it was really helpful and it was really nice.
    Thanks a lot. Cheers.

    P.S Stuck up with one problem actually. You are doing real time twitter data analysis, but I want to do, actually doing historical data analysis.
    I already have a dumb of 8 different companies and have to do sentiment analysis on each company individually. While following the above process mentioned ie from sentiment score I am getting error. Can you help me in that.

    And how to apply plyr library for a single .csv file which is already existing.

    Thanks,
    Karthik

    • Hey
      could you please show me some code you are using and where exactly the error appears?

      Regards

  • Hi Julian,

    During the authentication process, I am getting an error after entering the PIN saying “Error: Forbidden”. because of which I am unable to register.
    Kindly let me know who to proceed.

    • Hey
      could you please give me some details about the code you are using? It´s easier to use the direct authentication method. So if you provide all credentials to the setup function and so you don´t need to authenticate with a PIN.
      Take a look how I did it here: http://thinktostart.com/twitter-authentification-with-r/
      This should solve your problem.

      Regards

  • harold

    hi!
    After to put:

    twitCred$handshake(cainfo=”cacert.pem”)

    where put the PIN??

  • Anne

    Hey Julian (or someone else),

    I have a question: I am not sure if the extracting text function is working properly. I am trying to make a table with my Tweets in it by writing the results to a CSV format with some code I found online. I used the following below, resulting in an excel file with the following format:
    username, Tweet and some more stuff. I think it’s these things (first row of the file it says: ,”text”,”favorited”,”favoriteCount”,”replyToSN”,”created”,”truncated”,”replyToSID”,”id”,”replyToUID”,”statusSource”,”screenName”,”retweetCount”,”isRetweet”,”retweeted”,”longitude”,”latitude”). For my research I am only interested in username and text. How do I modify the results in such a way that I get an excel file as output with a) username b) tweet (and maybe a number like R assigns it?)

    my code (after authentification and loading of packages):

    tws<-searchTwitter("pvda", n=1000, since="2015-03-04")
    Tweets.text = lapply(tweets,function(t)t$getText())
    df <- do.call("rbind", lapply(tws, as.data.frame))
    write.csv(df,file="test3pvdatabel.csv")

  • Naresh

    Hi,
    Its a great tutorial and finds useful, but I have one major concern. You are processing each tweet and calculating how many positive and negative words in each tweet and you are calculating the sentiment score by obtaining difference of +ves and -ves.

    But a real sentiment analysis is much more indepth analysis rather than calculating just the number of +ve/-ve

    Can u suggest ant indepth analysis for sentiments

  • sanchit shaleen

    Hi Julian

    I am facing with a problem while running the function “setup_twitter_oauth(api_key, api_secret, token, token_secret)”.
    The error that I face is “Error: could not find function “setup_twitter_oauth” eventhough I have already installed and loaded twitteR package and have set the required values for the 4 parameters of the function setup_twitter_oauth()

    Kindly help me in resolving the error.

    Thanks
    Sanchit

    • Hey
      did you also load the package? with
      require(twitteR)

      Regards

  • Pierluigi

    Hi dear,

    although I followed your instructions, running the following command:

    tweets = searchTwitter(“#apple”, n=200, cainfo=”cacert.pem”)

    I got the following error:

    Error in get_oauth_sig() : OAuth has not been registered for this session

    Do you have any suggestion about finding a solution about that?
    Anyway, really a great post!

    • Hey
      this posts still includes an older version of the authentication process. Sorry for that.
      You should use the newer version you can find here: http://thinktostart.com/twitter-authentification-with-r/
      The newer version uses the setup_twitter_oauth() function.

      Regards

      • Pierluigi

        Thanks for the answer and the help!

  • ganesh

    Sir I am gettig an error “Error in inherits(.data, “split”) : object ‘sentences’ not found”.
    Thanks

  • Oli Paul

    >require(twitteR)

    > reqURL accessURL authURL consumerKey consumerSecret twitCred download.file(url=”http://curl.haxx.se/ca/cacert.pem”, destfile=”cacert.pem”)

    trying URL ‘http://curl.haxx.se/ca/cacert.pem’

    Content type ‘¸’Fþþ’ length 256338 bytes (250 KB)

    downloaded 250 KB

    > twitCred$handshake(cainfo=”cacert.pem”)

    Error: object ‘twitCred’ not found

    > registerTwitterOAuth(twitCred)

    Error in registerTwitterOAuth(twitCred) :

    ROAuth is no longer used in favor of httr, please see ?setup_twitter_oauth

    ^ I have plyr, twitteR, rjson, RJSONIO, RCurl & bitops packages loaded???

    What have I done wrong

    • Oli Paul

      Also tried

      > require(bitops)

      > require(rjson)

      > require(RCurl)

      > require(plyr)

      > require(RJSONIO)

      > require(twitteR)

      all before the code :S

      • Oli Paul

        setup_twitter_oauth(consumer_key=consumerKey,consumer_secret=consumerSecret,access_token=NULL, access_secret=NULL)

        [1] “Using browser based authentication”

        Error in loadNamespace(name) : there is no package called ‘base64enc’

        tried ?setup_twitter_oauth to help but didnt work!

  • Oli Paul

    If youre reading this in 2015 and cant authenticate:

    install.packages(“twitteR”)
    install.packages(“ROAuth”)
    library(“twitteR”)
    library(“ROAuth”)
    # Download “cacert.pem” file
    download.file(url=”http://curl.haxx.se/ca/cacert.pem”,destfile=”cacert.pem”)

    #create an object “cred” that will save the authenticated object that we can use for later sessions
    cred To enable the connection, please direct your web browser to: . Note: You only need to do this part once
    cred$handshake(cainfo=”cacert.pem”)
    Then click authorize app and right the number into your command line in R
    #save for later use for Windows
    save(cred, file=”twitter authentication.Rdata”)
    Now have fun!

    • montjoile

      thanks a lot!!!

  • Pavan Nayakanti

    Hi, am ending up with the below error while trying to set the Authentication. Please suggest what am missing here. Thanks.,

    > twitCred <- OAuthFactory$new(consumerKey=consumerKey,consumerSecret=consumerSecret,requestURL=reqURL,accessURL=accessURL,authURL=authURL)

    Error: object 'OAuthFactory' not found

    • Pavan Nayakanti

      found the answer… it requires library(ROAuth). Thanks. 🙂

  • Mohsin Mahmood

    Error in inherits(.data, “split”) :object ‘sentences’ not found”.

    Hello,
    I am getting the above error, could you help me as to why?

    thank you

  • Mareli Strauss

    halo I am experiencing the following error that says:

    “Error in FUN(X[[i]], …) : unused argument (.progress = “none”) ”

    and then I am given this as a traceback
    “3 FUN(X[[i]], …)

    2 lapply(sentences, function(sentence, pos.words, neg.words) {

    sentence = gsub(“[[:punct:]]”, “”, sentence)

    sentence = gsub(“[[:cntrl:]]”, “”, sentence)

    sentence = gsub(“\d+”, “”, sentence) …

    1 score.sentiment(Tweets.text, pos, neg)”

    please help.
    I have no idea how to fix it.

  • Gloria Kim

    Hello, I tried to use different keywords and number of tweets, but for all of the tries I get an error message saying “invalid input”. Do you have a way to get around this? If this doesn’t work cuz of one invalid input out of all tweets, how do you do the analysis?

  • raj samala

    Hi,
    When I execute the last code, analysis = score.sentiment(Tweets.text, pos, neg). I am getting this error message…”Error in sort.list(y) : invalid input….utf8toecs”.
    I have tried fixing this using….two codes…

    rmNonAlphabet <- function(str) {
    words <- unlist(strsplit(str, " "))
    in.alphabet <- grep(words, pattern = "[a-z|0-9]", ignore.case = T)
    nice.str <- paste(words[in.alphabet], collapse = " ")
    nice.str
    }
    df$text <- sapply(df$text,function(row) iconv(row, "latin1", "ASCII", sub=""))

    But nothing worked….can anyone help me….please.

    • Julian Hillebrand

      sorry for the late reply, I will do an update of this tutorial next when. Then the code should work again

  • anitha

    Cred

    • Julian Hillebrand

      Just deleted your actual consumer secret and key from the comment. You shouldn´t post them on the Internet 😉
      I will do an update for this post next week, and then I should work again.
      You can follow me on Twitter if you want to be updated

  • Aijaz Sheikh

    how to plot graph for positive and negative scores.

  • Deepti Nimmagadda

    Hii,
    I am getting the following error when i use the word.list command:
    Error: ‘s’ is an unrecognized escape in character string starting “‘s”
    Could you please tell me where I am going wrong?

  • Sneha Chinnu

    i am getting error on the below line:

    twitCred <- OAuthFactory$new(consumerKey=consumerKey,consumerSecret=consumerSecret,requestURL=reqURL,accessURL=accessURL,authURL=authURL)

  • Nopal Hilmy

    Hi,

    I just copy paste the function of sentiment and I get error with this ”
    }, pos.words, neg.words, .progress=.progress )
    Error: unexpected ‘}’ in ” }”

    Help me please. Thanks

  • Brice

    I get the below error at the end. What could be causing this.

    return(score)
    }, pos.words, neg.words, .progress=.progress )
    Error in llply(.data = .data, .fun = .fun, …, .progress = .progress, :
    object ‘.progress’ not found