Visualize Retweets with R

Share on FacebookTweet about this on TwitterShare on Google+Share on RedditShare on LinkedInEmail this to someone

Hi there!

Today I want to show you how to plot a graph filled with Data of Retweets of a Twitter Account.

To start you have to go through the Twitter authentification process as i described in an earlier blog post you can find here.

Let´s get started with the things we are really interested in.

Get the data

To start the data mining we need to load 3 packages.

To get the posts of a certain user, the twitteR package provides a cool function

This gets us around 1000 tweets from the user account @mashable. These are saved in the Variable tweets and we can extract the text in the next step with

But now we have to recognize the retweets in this huge amount of text.


Now that we have our raw data we can go on analyzing it.

Visualize Retweets with R

 

Now we have created our so called edge list. A list which shows the connections of our data. A very common construct in R.
But our goal wasn´t an edge list, but a graph. So we have to form our edge list in a nice graph.

Now there are just a few steps left to see our retweet graph.

And here it is:

Our nice retweets graph!

retweets mashable

Share on FacebookTweet about this on TwitterShare on Google+Share on RedditShare on LinkedInEmail this to someone
Profile photo of JulianHi

JulianHi

I´m an International Business student from Germany, interested in Data Analytics and Machine Learning with a focus on Marketing Applications. My favorite language is R.

You may also like...

19 Responses

  1. tux skywalker says:

    Hi. Great article. I tried to repeat all your instructions but in the loop of analysing data, the R shows:

    Error in check_string(string) : attempt to apply non-function

    You can please give me some help?

  2. Abhishek says:

    Hi julianhi…

    I encounter the problem while using above function “userTimeline”

    tweets = userTimeline(“tatadocomo”, n=100)

    “SSL certificate problem, verify that the CA cert is OK. Details:nerror:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed”
    Error in twInterfaceObj$doAPICall(cmd, params, method, …) :
    Error: SSL certificate problem, verify that the CA cert is OK. Details:
    error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed

    Kindly help

    Regards
    Abhishek

  3. rixi (@qx) says:

    Hello. I am also hitting the SSL error, with the message as described above. I am using RStudio but have just replicated the issue in base R 3.0 (64bit).

    I suspect it is related to authentication, because including _cainfo=”cacert.pem”_ in the searchTwitter() command seems to make all the difference:
    > searchTwitter(‘cnn’, cainfo=”cacert.pem”, n=100) # works fine
    > searchTwitter(‘cnn’, n=100) # throws SSL error like userTimeline()

    Trouble is, I cannot see an equivalent parameter through which to pass this certificate to userTimeline() !

    Any advice much appreciated!

  4. rixi (@qx) says:

    Aha! I have happened upon a solution that seems to work, here: http://stackoverflow.com/questions/8122879/roauth-on-windows-using-r

    Specifically: these two lines, run before loading the twitter credentials:
    > library(RCurl)
    > options(RCurlOptions = list(cainfo = system.file(“CurlSSL”, “cacert.pem”, package = “RCurl”)))

  5. Michael Hetrick says:

    ut <- userTimeline('test', 1500)
    tw.df <- twListToDF(ut)

    write.csv(tw.df, file="tweets.csv")

    for (i in 1:length(tw.df))
    {
    rt.df <- twListToDF(searchTwitter(tw.df[i,1], n=tw.df[i,12]))
    rt.df["originalTweet"] <- tw.df[i,1]
    user <- getUser(rt.df[i,11])
    user.df <- user$toDataFrame()

    if (i==1)
    {
    write.table(rt.df, file="retweetlog.csv", sep = ",", append=TRUE, col.names=TRUE, row.names=FALSE)
    write.table(user.df, file="userlog.csv", sep = ",", append=TRUE, col.names=TRUE, row.names=FALSE)
    }
    else
    {
    write.table(rt.df, file="retweetlog.csv", sep = ",", append=TRUE, col.names=FALSE, row.names=FALSE)
    write.table(user.df, file="userlog.csv", sep = ",", append=TRUE, col.names=FALSE, row.names=FALSE)
    }
    }

    While this code completes, I get warnings:

    Error in if (n <= 0) stop("n must be positive") :
    missing value where TRUE/FALSE needed
    In addition: Warning messages:
    1: In doRppAPICall("search/tweets", n, params = params, retryOnRateLimit = retryOnRateLimit, :
    19 tweets were requested but the API can only return 16
    2: In write.table(rt.df, file = "1_retweetlog-test2.csv", sep = ",", :
    appending column names to file
    3: In write.table(user.df, file = "1_userlog-test2.csv", sep = ",", :
    appending column names to file
    4: In doRppAPICall("search/tweets", n, params = params, retryOnRateLimit = retryOnRateLimit, :
    25 tweets were requested but the API can only return 24
    5: In doRppAPICall("search/tweets", n, params = params, retryOnRateLimit = retryOnRateLimit, :
    151 tweets were requested but the API can only return 135
    6: In doRppAPICall("search/tweets", n, params = params, retryOnRateLimit = retryOnRateLimit, :
    66 tweets were requested but the API can only return 62
    7: In doRppAPICall("search/tweets", n, params = params, retryOnRateLimit = retryOnRateLimit, :
    27 tweets were requested but the API can only return 23

    Are these associated with API limits or code errors? When running 1 tweet, seems fine, but trying max 1500 rarely returns any more than 150. When iterating through users, it is hard to understand how only 20 users are returned when I'm expecting 100s. Any help would be appreciated.

    • julianhi says:

      Hey Michael,
      yes your problems seem to be caused by the rate limits of the Twitter API. But Twitter has a very complicated system which defines these limits and they try to keep them secret. So the only thing you can do is to test all your function calls and hope that they work.

      I hope i could help you.

      Regards

  6. Nikhil Tuli says:

    Hi Julian,

    I tried to use the above code with minor changes. By using below search string, I am getting only 2 rows of data
    ———————————————————
    # regular expressions to find retweets
    grep(“(RT|via)((?:\b\W*@\w+)+)”, dm_tweets,
    ignore.case=TRUE, value=TRUE)

    # which tweets are retweets
    rt_patterns = grep(“(RT|via)((?:\b\W*@\w+)+)”,
    dm_txt, ignore.case=TRUE)

    # show retweets (these are the ones we want to focus on)
    dm_txt[rt_patterns]
    ————————————————————
    However, when I downloaded the contents into .csv file, there are many rows with retweetCount>0 (assuming then these are the retweets for a particular user account).

    I am not able to discern how is it possible.Could you please help.

    Also, I am not able to understand how we are searching the retweets through the grep function. Can you please clarify how we are making use of the regular expression.

    Thanks

    • julianhi says:

      Hey Nikhil,
      in my example I extracted the raw text of the tweets and searched for patterns indicating a retweet. You could do so as well and save if the text indicates a retweet or not in a boolean vector. With this you could filter your twitter list.
      I hope I could help you.

      Regards

  7. > library(twitteR)
    Loading required package: ROAuth
    Loading required package: RCurl
    Loading required package: bitops
    Loading required package: digest
    Loading required package: rjson
    > library (stringr)
    > library(plyr)
    Attaching package: ‘plyr’
    The following object is masked from ‘package:twitteR’: id
    > library(tm)
    > library(ggplot2)
    > library(igraph)
    > library(RColorBrewer)
    > library(wordcloud)
    > load(“twitter authentication.Rdata”)
    > registerTwitterOAuth(cred)
    [1] TRUE
    > utils:::menuInstallPkgs()
    — Please select a CRAN mirror for use in this session —
    Warning: package ‘RCurl’ is in use and will not be installed
    > library(RCurl)

    >public_tweets = publicTimeline()
    > publicTimeline
    Error: object ‘publicTimeline’ not found

    Hallo Julian I am from Indoenesia statistics student, I want to ask, why this >publicTimeline doesn’t running appropriately and that error object.

    I have an article like this https://sites.google.com/site/miningtwitter/basics/getting-data/by-twitter, and it have runned. How you respond?
    Thanks

    Regards

  8. MónicaNarro says:

    Hi. Julian, I repeat all your instructions but R shows:

    Error in check_string(string) : attempt to apply non-function

    You can please give me some help?

  9. MónicaNarro says:

    Hey,
    Exactly…After this function

    #remove ‘:’
    poster = gsub(“:”, “”, unlist(poster))
    # name of retweeted user
    who_post[[i]] = gsub(“(RT @|via @)”, “”, poster, ignore.case=TRUE)
    # name of retweeting user
    who_retweet[[i]] = rep(twit$getScreenName(), length(poster))
    }

  10. MónicaNarro says:

    I found the error! Instead of using the function “userTimeline”, you have to write the function “searchTwitter”.
    Could you help me with my comment in “Map twitter followers R”, please?

  11. Areeba says:

    Hey I”m getting these errors:
    # regular expressions to find retweets
    > grep(“(RT|via)((?:\b\W*@\w+)+)”, tweets,
    Error: ‘\W’ is an unrecognized escape in character string starting “”(RT|via)((?:\b\W”
    > ignore.case=TRUE, value=TRUE)
    Error: unexpected ‘,’ in ” ignore.case=TRUE,”
    > # which tweets are retweets
    > rt_patterns = grep(“(RT|via)((?:\b\W*@\w+)+)”,
    Error: ‘\W’ is an unrecognized escape in character string starting “”(RT|via)((?:\b\W”
    > tweet_txt, ignore.case=TRUE)
    Error: unexpected ‘,’ in ” tweet_txt,”
    >

    • Profile photo of JulianHi JulianHi says:

      Hey
      seems like you pasted the code in the R console not correctly. Like the ignore.case part has to be directly behind tweets, in the line before.

      Regards

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">

Skip to toolbar