viralheat twitter R

Sentiment Analysis on Twitter with Viralheat and R

Hi there!

Some time ago I published a post about doing a sentiment analysis on Twitter. I used two wordlists to do so; one with positive and one with negative words. For the first try of a sentiment analysis it is surely a good way to start but if you want to receive more accurate sentiments you should use an external API. And that´s what we do in this tutorial. But before we start you should take a look at the authentication tutorial and go through the steps.

The Viralheat API

The Viralheat sentiment API receives more than 300M calls per week. And this huge amount of calls makes this API become better and better. Everytime a company for example using this API notices that a tweet was analyzed wrong, lets say it was a positive tweet but the API said it is neutral, the user can correct it and the API can use this knowledge for the next time.

Viralheat registration

You can reach the Viralheat API with a free account. This account includes 1000calls/day what should be enough for starting. Just go to the Viralheat developer Center and register yourself: https://app.viralheat.com/developer

Viralheat Developer Center

Then you can generate your free API key we´ll need later.

Functions

The getSentiment() function

First import the needed packages for our analysis:

The getSentiment() function handles the queries we send to the API and splits the positive and negative statements out of the JSON reply and returns them in a list.

The clean.text() function

We need this function because of the problems occurring when the tweets contain some certain characters and to remove characters like “@” and “RT”.

Sentiment Analysis on Twitter with Viralheat and R

Ok now we have our functions, all packages and the API key.

In the first step we need the tweets. We do this with searchTwitter() function as usual.

In my example I used the keyword “iphone5”. Of course you can use whatever you want.

In the next steps we have to extract the text from the text and remove the characters with the clean_tweet() function. We just call these functions with:

Do the analysis

We come to our final step: the analysis. We call the getSentiment() with the text of every tweet and wait for the answer to save it to a list. So this can cost some time. Just replace API-KEY with your Viralheat API key.

That´s it! Now we have our analyzed tweets in the tweet_df list and you can show your results with

Sentiment Results

Note:

Sometimes the API breaks when receiving certain character. I couldn´t figure out why , but as soon as I know it  I will update this tutorial.

Please also note that sentiment analysis can just give you a roughly overview of the mood.

Julian Hillebrand

During my time at university and learning about the basics of economics I started heavily exploring the possibilities and changes caused by digital disruptions and the process of digital transformation, whereby I focused on the importance of data and data analytics and combination with marketing and management.
My personal focus of interest lies heavily on technology, digital marketing and data analytics. I made early acquaintance with programming and digital technology and never stop being interested in following the newest innovations.

I am an open, communicative and curious person. I enjoy writing, blogging and speaking about technology.

  • Deepak

    tweet clean not found is a error

    • Thanks for your comment! I forgot two lines of code. But now everything should work fine

  • Pingback: Sentiment Analysis on Twitter with Datumbox API | julianhi's Blog()

  • Krishna

    When am trying this am getting the following error while analyzing
    Error in fromJSON(content, handler, default.size, depth, allowComments, :
    invalid JSON input

  • Ray

    Error in lapply(X = X, FUN = FUN, …) : object ‘mc_tweets’ not found

    • Ray

      My tweet_df is coming back with no sentiment scores… suggestion?

      • Hey Ray,
        please replace “mc_tweets” with “tweets”.
        I also changed it in the code.
        Does this work for you?

        Regards

        • Ray

          Yes, I made that change. It was mostly for your reference. More concerning is that my df doesn’t have sentiment scores in it. Error in function (type, msg, asError = TRUE) :
          SSL certificate problem, verify that the CA cert is OK. Details:
          error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed. I’ve modified # harvest tweets
          tweets = searchTwitter(“#energyefficiency”, n=20, lang=”en”, cainfo=”cacert.pem”), so I think it’s the viral heat API not the twitter api. suggestions?

          • Hey Ray,
            for me everything works, I just tried it. There also weren´t any changes in the Viralheat Sentiment API. So I think it has something to do with your environment and especially the SSL settings.
            Did you use my Twitter authentication tutorial? It sets the SSL settings in the twitCred object. So there is no need to specify the cacert file when you call the searchTwitter function.
            Could you please post your R in- and output for me?

            Regards

          • Ray

            Thanks, Julianhi. I went back and followed your twitter tutorial and had a lot more success. It’s weird that the API breaks when it gets characters it doesn’t like. I’m going to try datumbox next. Thanks for your help, and really cool work!

  • Rasscalion

    Hi. I have reached the part where i already got my tweet results, ran the tweet clean etc. However when i reached the getSentiment() part i get the same SSL error as above..

    tmp tweets tweets tweets <- searchTwitter("iphone6", n=5, lang="en",cainfo="cacert.pem")
    which didn't work because i don't have the consumerkey yet.. but since i already have the tweets.. is it actually need to run this in order to run the getSentiment ()?..

  • Rasscalion

    I have reached the part where i already got my tweet results, ran the clean_tweet() etc., but when i reached the getSentiment() got the same SSL error as above.

    Error in function (type, msg, asError = TRUE) :
    SSL certificate problem, verify that the CA cert is OK. Details:
    error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed

    From another comment, i tried the option to turn off the CA cert, by running the following but got an error. But is it needed since i already got the twitter results?
    > tweets twitCred <- OAuthFactory$new(consumerKey=consumerKey,consumerSecret=consumerSecret,requestURL=reqURL,accessURL=accessURL,authURL=authURL)

    which didn't work because i don't have the consumerkey yet..

    Results: Error in initRefFields(.self, .refClassDef, as.environment(.self), list(…)) :
    object 'consumerKey' not found

    but since i already have the tweets.. is it actually need to run this in order to run the getSentiment ()?

    Please advice..

    • Hey
      could you please post the whole code until the mentioned error appears?
      Which way did you use to get the tweets at the end?
      And what does your tweet dataframe contain?

      Regards

      • Sumit Parkar

        Hi Julian,
        I am trying your blog code for sentiment analysis. But its sows error. i am using R3.2.1.

        > for (i in 1:mcnum)

        + {

        + tmp = getSentiment(tweet_clean[i], “HUkLo8goiSrJgErEWr0G0kDlW”)

        + tweet_df$sentiment[i] = tmp$mood

        + tweet_df$score[i] = tmp$score

        + }

        Error in fromJSON(data, asText = TRUE) : unused argument (asText = TRUE)

        • Julian Hillebrand

          Hey
          please leave out the “asText” argument in the line:
          js <- fromJSON(data, asText=TRUE);

          So make it to:
          js <- fromJSON(data);

          Please tell me if that solved your problem.

          Best regards

  • Rasscalion

    Here is the whole code i ran.. (everything* up to the point of error)

    library(twitteR)
    library(ROAuth)
    api_key <- "Dmv3CFTopnh0b71ODTyexxGYS"

    api_secret <- "U7eXweeHuk7xVRoIDUOyERGndjJ24fMI7JlbCUgqroKE7SXZP2"

    access_token <- "100922511-IqCvQ6ucFyDS83QQHscjOBh0NUhSMtIT4gVDE4Ce"

    access_token_secret <- "RSxjokYbLv9ZHK25j4glAJvajiAKjWz0JY1qkYYqQk6OP"

    setup_twitter_oauth(api_key,api_secret,access_token,access_token_secret)
    library(RCurl)
    library(RJSONIO)
    library(stringr)
    getSentiment <- function (text, key){
    text <- URLencode(text);

    # Save all the spaces, then get rid of the weird characters that break the API,
    # then convert back the URL-encoded spaces.

    text <- str_replace_all(text, "%20", " ");
    text <- str_replace_all(text, "%\\d\\d", "");
    text <- str_replace_all(text, " ", "%20");

    if (str_length(text) 360){
    text <- substr(text, 0, 359);
    }

    data <- getURL(paste("https://www.viralheat.com/api/sentiment/review.json?api_key=&quot;, key, "&text=",text, sep=""))

    js <- fromJSON(data, asText=TRUE);

    # get mood probability
    score = js$prob

    # positive, negative or neutral?
    if (js$mood != "positive")
    {
    if (js$mood == "negative") {
    score = -1 * score
    } else {
    # neutral
    score = 0
    }
    }

    return(list(mood=js$mood, score=score))
    }

    clean.text <- function(some_txt)
    {
    some_txt = gsub("(RT|via)((?:\\b\\W*@\\w+)+)", "", some_txt)
    some_txt = gsub("@\\w+", "", some_txt)
    some_txt = gsub("[[:punct:]]", "", some_txt)
    some_txt = gsub("[[:digit:]]", "", some_txt)
    some_txt = gsub("http\\w+", "", some_txt)
    some_txt = gsub("[ \t]{2,}", "", some_txt)
    some_txt = gsub("^\\s+|\\s+$", "", some_txt)

    # define "tolower error handling" function
    try.tolower = function(x)
    {
    y = NA
    try_error = tryCatch(tolower(x), error=function(e) e)
    if (!inherits(try_error, "error"))
    y = tolower(x)
    return(y)
    }

    some_txt = sapply(some_txt, try.tolower)
    some_txt = some_txt[some_txt != ""]
    names(some_txt) = NULL
    return(some_txt)
    }

    # harvest tweets
    tweets <- searchTwitter("iphone6", n=5, lang="en") #,cainfo="cacert.pem")
    tweet_txt = sapply(tweets, function(x) x$getText())
    tweet_clean = clean.text(tweet_txt)
    mcnum = length(tweet_clean)
    tweet_df = data.frame(text=tweet_clean, sentiment=rep("", mcnum), score=1:mcnum, stringsAsFactors=FALSE)
    tweet_df
    sentiment <- rep(0, mcnum)
    tweet_sample <- tweet_clean[2]
    tweet_sample
    tmp tweet_df
    text
    1 i love my new iphone
    2 how does the iphone camera compare to previous iphone cameras —via …
    3 newsmalaysians can get only the iphoneiphone malaysiansgrey market importers haveipecenter
    4 got my iphone today and ive already seenof the apple associates out and about a cpl hours later vegas is too small lol
    5 unique iphonecaseshot pink black diva iphone tough iphonecase iphone iphonep

    > tweet_sample
    [1] “how does the iphone camera compare to previous iphone cameras —via …”

    and voila the error..
    > tmp <- getSentiment(tweet_sample, "OV5gEwTCmGOYvQCa7Wia")
    Show Traceback

    Rerun with Debug
    Error in function (type, msg, asError = TRUE) :
    SSL certificate problem, verify that the CA cert is OK. Details:
    error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed

    is this due to the consumer key error? i have gotten the tweets without it.. but only hit the failed SSL error when i reach the getSentiments() part..