Sentiment Analysis on Twitter with Datumbox API

Share on FacebookTweet about this on TwitterShare on Google+Share on RedditShare on LinkedInEmail this to someone

Hey there!


After my post about sentiment analysis using the Viralheat API I found another service. Datumbox ist offering special sentiment analysis for Twitter. But this API doesn´t just offer sentiment analysis, it offers a much more detailed analysis. „The currently supported API functions are: Sentiment Analysis, Twitter Sentiment Analysis, Subjectivity Analysis, Topic Classification, Spam Detection, Adult Content Detection, Readability Assessment, Language Detection, Commercial Detection, Educational Detection, Gender Detection, Keyword Extraction, Text Extraction and Document Similarity.“

But note:
Datumbox just offers Sentiment analysis for tweets. All the other classifiers like gender or topic are build for longer texts and not for short tweets as they have too less chars. So the results for tweets can be inaccurately.

But these are very interesting features and so I wanted to test them with R.

But before we start you should take a look at the authentication tutorial and go through the steps.

The API Key

In the first step you need an API key. So go to the Datumbox website and register yourself. After you have logged in you can see your free API key here:


Ok, let´s go on with R.


The getSentiment() function

First import the needed packages for our analysis:

The getSentiment() function handles the queries we send to the API. It saves all the results we want to have like sentiment, subject, topic and gender and returns them as a list. For every request we have the same structure and the API is always requesting the API-Key and the text to be analyzed. It then returns a JSON object of the structure

So what we want to have is the “result”. We extract it with js$output$result where js is the saved JSON response.

The clean.text() function

We need this function because of the problems occurring when the tweets contain some certain characters and to remove characters like “@” and “RT”.

Let´s start

Ok now we have our functions, all packages and the API key.

In the first step we need the tweets. We do this with searchTwitter() function as usual.

In my example I used the keyword “iphone5″. Of course you can use whatever you want.

In the next steps we have to extract the text from the text and remove the characters with the clean_tweet() function. We just call these functions with:

Then we need to count our tweets and based on this information we build a data frame we will fill with the information from our analysis

Do the analysis

We come to our final step: the analysis. We call the getSentiment() with the text of every tweet and wait for the answer to save it to a list. So this can cost some time. Just replace API-KEY with your Datumbox API key.

That´s it! We saved all our parameters in a list and can take a look at our Analysis.

text sentiment subject topic gender
shit your phone man wtf all ur memories and its a freaking iphone is it in the schl or with ur teacher negative subjective Arts male
fuck iphone i want the s then o negative subjective Home & Domestic Life female
stay home saturday night vscocam iphone picarts bored saturday stay postive reoverlay negative objective Sports female
why i love the mornings sunrise pic iphone now lets get crossfit wod goingcompass fitness positive subjective Home & Domestic Life female
iphone or stick with my bbhelp positive subjective Home & Domestic Life female

You can just display your data frame in R with:

Or you can save it to a CSV File with:



Share on FacebookTweet about this on TwitterShare on Google+Share on RedditShare on LinkedInEmail this to someone
Profile photo of JulianHi


I´m an International Business student from Germany, interested in Data Analytics and Machine Learning with a focus on Marketing Applications. My favorite language is R.

You may also like...

8 Responses

  1. Krishna says:

    When am doing the Analysis Part( Last part).. Am getting the following error. Can you help me out
    Error in tweet_df$sentiment[i] = tmp$sentiment :
    replacement has length zero

  2. Krishna says:

    Hi.. I tried it many times.. But still not working..
    The problem seems to be when doing the analysis.
    tmp = getSentiment(tweet_clean[i],…..
    it tries to call the getSentiment function but here the tmp has NULL value inside that. Something is not correct with getSentiment function itself.

    • julianhi says:

      There seems to be a massive problem with WordPress and its code highlighting function. Do you see all the &amp things in the code? They shouldn’t be there. I have to take a look at it

  3. Nishant says:

    @Krishna, the ‘replacement at length zero’ error comes when the getSentiment function returns NULL values, which is primarily due to exceeding the free-limit on the API pulls(1000 a day). A work-around is to reconnect to the internet with a fresh ip(restarting the router worked for me) and using a different API-key.

    A way to check the exact source of error is to run the following url in the browser:“your api-key here”&text=”your text here”
    and check the message displayed. If a daily limit exceeded error comes, the R query will return NULL values. If a desired result comes, you can proceed with the R query.

  4. blackorwa says:

    Thank you for the code, I had redo Twitter authentication to make it work.

  1. March 13, 2014

    Thank you for the code, I had redo Twitter authentication to make it work.

  2. March 23, 2014

    Thank you for the code, I had redo Twitter authentication to make it work.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">