You can use this tutorial in the ThinkToStartR package  and create your Twitter sentiment word cloud in R with:

 

Hey everybody,

some days ago I created a wordcloud filled with tweets of a recent german news topic. And a lot of people asked me if I have some code how I created this cloud. And so here it is.

In the end the plot will basically look like this:

Twitter Wordcloud R

It uses tweets and the datumbox twitter-sentiment API.

Preparation:

First let´s get a datumbox API key like I described in this tutorial:

In the first step you need an API key. So go to the Datumbox website http://www.datumbox.com/ and register yourself. After you have logged in you can see your free API key here: http://www.datumbox.com/apikeys/view/

Like always when we want to work with Twitter we have to go through the authentication process like I described here.

And then we need some packages for this tutorial but they are all available at CRAN:

The last preparation step is defining two functions which will help us a lot. The first cleans the text we send it and removes unwanted chars and the second sends the text to the datumbox API.

Let´s start!

First we have to get some tweets:

Then we get the text from these tweets and remove all the unwanted chars:

Now we create a dataframe where we can save all our data in like the tweet text and the results of the sentiment analysis.

In the next step we apply the sentiment analysis function getSentiment() to every tweet text and save the result in our dataframe. Then we delete all the rows which don´t have a sentiment score. This sometimes happens when unwanted characters survive our cleaning procedure.

Now that we have our data we can start building the wordcloud.

The Wordcloud

First we get the different forms of sentiment scores the API returned. If you used the Datumbox API you will have positive, neutral and negative. With the help of them we divide the tweet texts into categories.

The next line of code seems to be a little bit complicated. But it is enough if you know that it generates labels for each sentiment category which include the percents.

Then we create the so called docs for each category and add the tweet texts to these categories:

The next steps are the same steps you would use for a “normal” worcloud. We just create a TermDocument Matrix and call the function comparison.cloud() from the “wordcloud” package

 

Of course you can find the whole code on github.

And if you always want stay up to date about my work and the topics R, analytics and Machine Learning feel free to follow me on Twitter

SHARE
Previous articleWas denkt Twitter über Hoeneß?
Next articleThinkToStartR package
During my time at university and learning about the basics of economics I started heavily exploring the possibilities and changes caused by digital disruptions and the process of digital transformation, whereby I focused on the importance of data and data analytics and combination with marketing and management. My personal focus of interest lies heavily on technology, digital marketing and data analytics. I made early acquaintance with programming and digital technology and never stop being interested in following the newest innovations. I am an open, communicative and curious person. I enjoy writing, blogging and speaking about technology.
  • datahappy

    Sorry, my first comment didn’t post. Thanks for posting this tutorial.

    If I was interested in modifying the code to only build the cloud out of tweets from a specific area, like a state or country, any ideas how I would do that?

    Thanks!

    • http://thinktostart.wordpress.com julianhi

      Hey datahappy,
      yes you can do so. You have to modify the searchTwitter function call like this:
      searchTwitter(“iphone”, since=’2011-03-01’, until=’2011-03-02’)

      Regards

    • http://thinktostart.wordpress.com julianhi

      Oh Sorry that was the wrong code.
      The code hast to look like this:
      searchTwitter(“iphone”, geocode=’42.375,-71.1061111,10mi’)

      For the geocode argument, the values are given in the format latitude,longitude,radius, where
      the radius can have either mi (miles) or km (kilometers) as a unit. For example geocode=’37.781157,-122.39720,1mi’

      Regards

  • https://www.facebook.com/sarah.custer Sarah Custer-Lalanne

    Thank you so much for this great resource! I’m very new to R and TwitteR and I’m having some trouble with entering my API key – how and where exactly to I enter it, and what is the ‘text’ that needs to be entered afterwards? It would be great if someone could provide an example with a fake key. Thank you in advance!

    • yann

      Hi Same boat as Sarah would anyone be able to show an example of the specific line
      data <- getURL(paste("http://api.datumbox.com/1.0/TwitterSentimentAnalysis.json?api_key=&quot;, key, "&text=",text, sep=""))
      i cant seem to get how to put the API key correctly and whether i need to change something to the text?

      many thanks!

      • http://thinktostart.wordpress.com julianhi

        Hey Yann
        I just replied to Sarah’s question.
        Does that help you further?

        Regards

    • http://thinktostart.wordpress.com julianhi

      Hey Sarah,
      Sorry for the late answer.
      Actually you have to set your datumbox API key as db_key.
      So type in db_key <- "your key"
      The part at the top is just a function. It needs the API key and a text which should be analyzed. But this function is called in the for loop so you don't have to worry about that.
      Did this help you?
      Please feel free to ask further questions.
      Regards

      • yann

        Hi Julian

        appreciate your help – bear with me while i fix that

        so i have done
        db_key <- "xxxxxxxxAPIxxxxxxxxxx"
        then
        data <- getURL(paste("http://api.datumbox.com/1.0/TwitterSentimentAnalysis.json?api_key=&quot;, db_key, "&text=",text, sep=""))

        but still getting the following:

        tweets = searchTwitter("iPhone", 20, lang="en")
        Error in twInterfaceObj$doAPICall(cmd, params, "GET", …) :
        OAuth authentication is required with Twitter's API v1.1

        any idea? sorry for being such a pain

        • http://thinktostart.wordpress.com julianhi

          Don’t worry;)
          Before you can use the code you have to do the Twitter authentication. You can find that tutorial on my blog.
          After the authentication you can execute the code here in this tutorial.
          So the steps are:
          – Twitter authentication
          – define db_key
          – execute this code here in the tutorial

          Regards

  • Pingback: ThinkToStartR package | julianhi's Blog()

  • yann

    oh dear sorry for not reading carefully your post

    on the Twitter authentification i am running the code but cant find the PIN number i am supposed to paste back into R, any idea why?

    cheers and thanks again for your help!

  • yann

    Almost there!!!!

    almost been through the whole script, now stuck here:

    # apply function getSentiment
    sentiment = rep(0, tweet_num)
    for (i in 1:tweet_num)
    {
    tmp = getSentiment(tweet_clean[i], db_key)
    tweet_df$sentiment[i] = tmp$sentiment
    print(paste(i,” of “, tweet_num))
    }

    when running i got the following error: Error in fromJSON(data, asText = TRUE) : unused argument (asText = TRUE)

    thanks

    • Ankur

      I’m also struck here!!! Although I get another error

      Error in simplify(obj, simplifyVector = simplifyVector, simplifyDataFrame = simplifyDataFrame, :
      unused argument (asText = TRUE)

      Not sure what is this!

      • http://thinktostart.com/members/julianhi/ Julian Hillebrand

        Hey,
        could you please post the code you used until the error appeared?

        regards

  • Honey

    This is really a great example with a nice explanation. While running the last part of the code, I got the following answer

    Error in strwidth(words[i], cex = size[i], …) : invalid ‘cex’ value

    Thanks

    • http://thinktostart.com/members/julianhi/ Julian Hillebrand

      Hey
      Where does the error exactly happen? At which line of code?

      Regards

  • Sander Ehmsen

    Thanks for this great post – making it possible to create nice looking WordClouds.
    I was wondering whether you could help me with a possible fairly simple question:
    I have trouble getting the twitterdata in to an ordinary sql-table.
    I want a column for each variable as in text, id, … but when I try to do something like:

    TB15 <- searchTwitter("#tb15", n=1500)
    as.data.frame(TB15)

    I get an error saying that it is not possible to force this data into a dataframe. I have searched the web only to discover no real solutions. But maybe you can help me?!

    • http://thinktostart.com/members/julianhi/ Julian Hillebrand

      Hey

      the searchTwitter functions return a list object. You could use for example something described here:
      http://stackoverflow.com/questions/4227223/r-list-to-data-frame

      Hope that helped you.

      Regards

      • Sander Ehmsen

        Hey Julian

        I hate to do this – continue to use your time, while you could be producing more interesting blog posts.
        But I have tried to solve the problem for several hours over several days.

        I really want to transform the lists uptained from

        TB15 <- searchTwitter("#tb15", n=1500)

        into a table contaning one column for each variable: text, screenName, id …. (http://www.inside-r.org/packages/cran/twitteR/docs/status). But eventhough I have tried

        Tweets <- data.frame(matrix(unlist(TB15), nrow=12, byrow=T))

        and

        tabletest <- statusFactory(TB15)

        I have not found a way.

        I will very much apreciate if you will spend a few minuts to try to walk me through a solution. Maybe think of me as your best friends mother while explaining it.

        I am very fascinated by the possibilities in the R environment and want to expan a lot on the usage of it. And I can see that you have great tutorials for many different packages here – and even developed your own.

        Best regards,
        Sander Ehmsen, Denmark..

  • Sander Ehmsen

    For others who might have the same issue as me. I simply found that

    #get data
    TB<-searchTwitter("tinderbox", lan="da", n=10000)
    #transform to dataframe
    df <- do.call("rbind", lapply(TB, as.data.frame))

    does the trick.

  • Cecile

    Hi, I just run all the code for

    AAPL.tweets = searchTwitter(“$AAPL”,n=5000,cainfo=”cacert.pem”)

    it worked but the cloud returns me :

    1.26% negative with words like: imeanhaaapl, ppch, newsdo, peakveryruebuall…
    13.28% tral with: says, documenclaims, launching
    85.46% positive with hâ€, baba, load, ebay, joinâ€

    I mean: First I don’t know if the % are correct,
    Then, I have some “h—, “join— … the data were not well clean and I can’t removed these characters
    Finally, it doesn’t seems like sentiment to me… like ebay? baba? launching? release? flser? msft? mcd?….

    Could you please help me? Is it normal? I thought I would have more sentiment like “great”, “bad”, “perfect”, i don’t know…

    Thank you,

    Regards,
    Cecile

    • http://thinktostart.com/members/julianhi/ Julian Hillebrand

      Hey
      the code connects the sentiment in the tweets with the words. So when the words appeared in a positive tweet, they are considered as positive.
      And it depends on your tweets which characters you have to delete before the visualization. Maybe this can help you further: http://www.r-bloggers.com/automatic-cleaning-of-messy-text-data/

      Regards

  • Newbie

    Hi !
    Great work.
    Well, I am trying to create a word-cloud using tweets. But all it shows in the wordcloud is: object,class,status words
    LINK to screenshot: https://drive.google.com/file/d/0B4ZhibK97rv0SE9XOXgwOERYNjQ/view?usp=sharing
    Help appreciated.
    Thanks

    • http://thinktostart.com/members/julianhi/ Julian Hillebrand

      Hey
      seems like your wodcloud function tries to actually visualize a function.
      Is it possible that you left out the part where you call the function and save the result?

      Regards