Hi there!
Today I want to show you how to plot a graph filled with Data of Retweets of a Twitter Account.
To start you have to go through the Twitter authentification process as i described in an earlier blog post you can find here.
Let´s get started with the things we are really interested in.
Get the data
To start the data mining we need to load 3 packages.
1 2 3 |
library(twitteR) library(igraph) library(stringr) |
To get the posts of a certain user, the twitteR package provides a cool function
1 |
tweets = userTimeline("mashable", n=1000) |
This gets us around 1000 tweets from the user account @mashable. These are saved in the Variable tweets and we can extract the text in the next step with
1 |
tweet_txt = sapply(tweets, function(x) x$getText()) |
But now we have to recognize the retweets in this huge amount of text.
1 2 3 4 5 6 7 8 9 |
# regular expressions to find retweets grep("(RT|via)((?:\b\W*@\w+)+)", tweets, ignore.case=TRUE, value=TRUE) # which tweets are retweets rt_patterns = grep("(RT|via)((?:\b\W*@\w+)+)", tweet_txt, ignore.case=TRUE) # show retweets (these are the ones we want to focus on) tweet_txt[rt_patterns] |
Visualize Retweets with R
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
# we create a list to store user names who_retweet = as.list(1:length(rt_patterns)) who_post = as.list(1:length(rt_patterns)) # for loop for (i in 1:length(rt_patterns)) { # get tweet with retweet entity twit = tweets[[rt_patterns[i]]] # get retweet source poster = str_extract_all(twit$getText(), "(RT|via)((?:\b\W*@\w+)+)") #remove ':' poster = gsub(":", "", unlist(poster)) # name of retweeted user who_post[[i]] = gsub("(RT @|via @)", "", poster, ignore.case=TRUE) # name of retweeting user who_retweet[[i]] = rep(twit$getScreenName(), length(poster)) } # and we put it off the list who_post = unlist(who_post) who_retweet = unlist(who_retweet) |
Now we have created our so called edge list. A list which shows the connections of our data. A very common construct in R.
But our goal wasn´t an edge list, but a graph. So we have to form our edge list in a nice graph.
1 2 3 4 5 6 7 8 |
# two column matrix of edges retweeter_poster = cbind(who_retweet, who_post) # generate graph rt_graph = graph.edgelist(retweeter_poster) # get vertex names ver_labs = get.vertex.attribute(rt_graph, "name", index=V(rt_graph)) |
Now there are just a few steps left to see our retweet graph.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
# choose some layout glay = layout.fruchterman.reingold(rt_graph) # plot par(bg="gray15", mar=c(1,1,1,1)) plot(rt_graph, layout=glay, vertex.color="gray25", vertex.size=10, vertex.label=ver_labs, vertex.label.family="sans", vertex.shape="none", vertex.label.color=hsv(h=0, s=0, v=.95, alpha=0.5), vertex.label.cex=0.85, edge.arrow.size=0.8, edge.arrow.width=0.5, edge.width=3, edge.color=hsv(h=.95, s=1, v=.7, alpha=0.5)) # add title title("nTweets from the User account @mashable: Who retweets whom", cex.main=1, col.main="gray95") |
And here it is:
Our nice retweets graph!