Build your own Twitter Archive and Analyzing Infrastructure with MongoDB, Java and R [Part 2] [Update]

Share on FacebookTweet about this on TwitterShare on Google+Share on RedditShare on LinkedInEmail this to someone

Hello everybody,

in my first tutorial I described how you can build your own MongoDB and use a JAVA program to mine Twitter either via the search function and a loop or via the Streaming API. But till now you just have your tweets stores in a Database and we couldn´t get any insight in our tweets now.

So we will take a look in this tutorial on how to connect to the MongoDB with R and analyze our tweets.

twitter infrastructure databse

Start the MongoDB

To access the MongoDB I use the REST interface. This is the easiest way for accessing the database with R when just have started with it. If you are a more advanced user, you can also use the rmongodb package and the code provided by the user abhishek. You can find the code below.

mongodb daemon

So we have to start the MongoDB daemon. It is located in the folder “bin” and has the name “mongod”. So navigate to this folder and type in:

 

This way we start the server and enable the access via the REST interface.

R

Let´s take a look at our R code and connect to the Database.

First we need the two packages RCurl and rjson. So type in:

 

 

Normally the MongoDB server is running on the port 28017. So make sure that there is no firewall or other program blocking it.

So we have to define the path to the data base with:

 

 

And so you saved the Tweets the received. You can now analyze them like I explained in other tutorials about working with R and Twitter.

You can for example extract the text of your tweets and store it in a dataframe with:

 

If you have any questions feel free to ask or follow me on Twitter to get the newest updates about analytics with R and analytics of Social Data.

                                                                                                             

 

 

Share on FacebookTweet about this on TwitterShare on Google+Share on RedditShare on LinkedInEmail this to someone

JulianHi

I´m an International Business student from Germany, interested in Data Analytics and Machine Learning with a focus on Marketing Applications. My favorite language is R.

You may also like...

11 Responses

  1. Abhishek says:

    Hi Julianhi,

    Even I tried to get the data from MongoDB

    Below is the code and its working fine

    # install package to connect through monodb
    install.packages(“rmongodb”)
    library(rmongodb)
    # connect to MongoDB
    mongo = mongo.create(host = “localhost”)
    mongo.is.connected(mongo)

    mongo.get.databases(mongo)

    mongo.get.database.collections(mongo, db = “tweetDB2″) #”tweetDB” is where twitter data is stored

    library(plyr)
    ## create the empty data frame
    df1 = data.frame(stringsAsFactors = FALSE)

    ## create the namespace
    DBNS = “tweetDB2.#analytic”

    ## create the cursor we will iterate over, basically a select * in SQL
    cursor = mongo.find(mongo, DBNS)

    ## create the counter
    i = 1

    ## iterate over the cursor
    while (mongo.cursor.next(cursor)) {
    # iterate and grab the next record
    tmp = mongo.bson.to.list(mongo.cursor.value(cursor))
    # make it a dataframe
    tmp.df = as.data.frame(t(unlist(tmp)), stringsAsFactors = F)
    # bind to the master dataframe
    df1 = rbind.fill(df1, tmp.df)
    }

    dim(df1)

    Regards
    Abhishek

  2. Abhishek says:

    Hi Julianhi,

    Please go ahead and put code in your tutorial.it will help many like me
    my twitter id @kapoorabhishek

    Regards
    Abhishek

  3. Santhosh Nair says:

    I have the same infrastructure running for some time now with data from twitter, facebook , clout , blog, forums all integrated into a single mongodb collection .

    Here is small example and addition to Abhishek’s code to filter and select specific columns from the collection which select only English only tweets.

    #define a query condition
    query = mongo.bson.buffer.create()
    mongo.bson.buffer.append(query, “interaction.interaction.type” , “twitter”)
    mongo.bson.buffer.append(query, “interaction.twitter.lang” , “en”)
    # when complete, make object from buffer
    query = mongo.bson.from.buffer(query)

    # define the fields
    fields = mongo.bson.buffer.create()
    mongo.bson.buffer.append(fields, “interaction.interaction.content”, 1L)
    mongo.bson.buffer.append(fields, “_id”, 0L)
    # when complete, make object from buffer
    fields = mongo.bson.from.buffer(fields)

    # create the cursor
    cursor = mongo.find(mongo, ns=DBNS, query=query, fields=fields)
    ……..
    ….rest of the code is same
    …….
    ….
    ….

    • julianhi says:

      Hey Santhosh Nair,
      thanks for sharing your code. Do you have something like a web page for your project?
      I´d like to see how you realized the connection forums and blogs.

      Regards

      • Santhosh Nair says:

        Hi julianhi

        I am using a third party aggregation service to integrate the data before writing it to Mongodb . Unfortunately i cannot provide more details

        Regards
        Santhosh

  4. Gianpiero says:

    Hi Julianhi,

    I’m a beginner. I want to know if you have a code for collecting data with R & MongoDB without using Javascript. I like your work and it’s very usefull to built this program. I’d want improve my knowledge :) Thanks!!

    • julianhi says:

      Hey Gianpiero,
      Thank you for your comment!
      But I don´t have the code for doing this task with R. You can search for the RMongoDB package and its documentation. With it´s help you could program it on your own.
      Regards

  1. December 10, 2013

    Hey Gianpiero,
    Thank you for your comment!
    But I don´t have the code for doing this task with R. You can search for the RMongoDB package and its documentation. With it´s help you could program it on your own.
    Regards

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">