Analyze LinkedIn with R

Some time ago I saw an interesting post in a R related group on LinkedIn. It was from Michael Piccirilli and he wrote something about his new package Rlinkedin. I was really impressed by his work and so I decided to write a blog post about it.

Get the package

The package is currently just available via GitHub. But thanks to devtools it is not a problem to install it.

require(devtools)

install_github("mpiccirilli/Rlinkedin")

require(Rlinkedin)

require(devtools)

install_github("mpiccirilli/Rlinkedin")

require(Rlinkedin)

Authenticate with LinkedIn

In the next step we have to connect to the LinkedIn API via oAuth 2.0. You have two possibilities in the Rlinkedin package.

You can just use

in.auth <- inOAuth()

1	in.auth <- inOAuth()

to use a default API key for getting LinkedIn data.

Or:

You can use your own application and application credentials to connect to the API.

Therefore you have to create an application on LinkedIn. Go on https://www.linkedin.com/secure/developer and log in with your LinkedIn account. Then click on “Add new Application”.

On the next page you can see app settings. Just set them as you can see on the following screenshots:

Then click on “Add application” and you get forwarded to your app´s credentials. Switch back to R and set the following variables:

app_name <- "XXX"
consumer_key <- "XXX"
consumer_secret <- "XXX"

app_name <- "XXX"

consumer_key <- "XXX"

consumer_secret <- "XXX"

Then you can authenticate with:

in.auth <- inOAuth(app_name, consumer_key, consumer_secret)

1	in.auth <- inOAuth(app_name, consumer_key, consumer_secret)

After a successful authentication you start to get some data.

Analyze LinkedIn with R

Michael created a nice overview of the different functions on the package´s GitHub page. So I will just show you here a small sample analysis.

First, lets download all your connections with:

my.connections <- getMyConnections(in.auth)

1	my.connections <- getMyConnections(in.auth)

This creates a data frame with all available information about your connections. For our analysis we will get the column “industry” which is the industry the person is working in. We will use it to create a small word cloud.

text <- toString(my.connections$industry)

1	text <- toString(my.connections$industry)

We will then transform this text with some standard word cloud to a nice looking industry overview:

clean.text <- function(some_txt)
{
  some_txt = gsub("(RT|via)((?:\\b\\W*@\\w+)+)", "", some_txt)
  some_txt = gsub("@\\w+", "", some_txt)
  some_txt = gsub("[[:punct:]]", "", some_txt)
  some_txt = gsub("[[:digit:]]", "", some_txt)
  some_txt = gsub("http\\w+", "", some_txt)
  some_txt = gsub("[ \t]{2,}", "", some_txt)
  some_txt = gsub("^\\s+|\\s+$", "", some_txt)
  some_txt = gsub("amp", "", some_txt)
  # define "tolower error handling" function
  try.tolower = function(x)
  {
    y = NA
    try_error = tryCatch(tolower(x), error=function(e) e)
    if (!inherits(try_error, "error"))
      y = tolower(x)
    return(y)
  }
  
  some_txt = sapply(some_txt, try.tolower)
  some_txt = some_txt[some_txt != ""]
  names(some_txt) = NULL
  return(some_txt)
}

clean_text = clean.text(text)
tweet_corpus = Corpus(VectorSource(clean_text))

tdm = TermDocumentMatrix(tweet_corpus, control = list(removePunctuation = TRUE,stopwords = stopwords("english"), removeNumbers = TRUE, tolower = TRUE))

#install.packages(c("wordcloud","tm"),repos="http://cran.r-project.org")

library(wordcloud)
m = as.matrix(tdm) #we define tdm as matrix
word_freqs = sort(rowSums(m), decreasing=TRUE) #now we get the word orders in decreasing order
dm = data.frame(word=names(word_freqs), freq=word_freqs) #we create our data set
wordcloud(dm$word, dm$freq, random.order=FALSE, colors=brewer.pal(8, "Dark2")) #and we visualize our data

clean.text <- function(some_txt)

{

some_txt = gsub("(RT|via)((?:\\b\\W*@\\w+)+)", "", some_txt)

some_txt = gsub("@\\w+", "", some_txt)

some_txt = gsub("[[:punct:]]", "", some_txt)

some_txt = gsub("[[:digit:]]", "", some_txt)

some_txt = gsub("http\\w+", "", some_txt)

some_txt = gsub("[ \t]{2,}", "", some_txt)

some_txt = gsub("^\\s+|\\s+$", "", some_txt)

some_txt = gsub("amp", "", some_txt)

# define "tolower error handling" function

try.tolower = function(x)

{

y = NA

try_error = tryCatch(tolower(x), error=function(e) e)

if (!inherits(try_error, "error"))

y = tolower(x)

return(y)

}

some_txt = sapply(some_txt, try.tolower)

some_txt = some_txt[some_txt != ""]

names(some_txt) = NULL

return(some_txt)

}

clean_text = clean.text(text)

tweet_corpus = Corpus(VectorSource(clean_text))

tdm = TermDocumentMatrix(tweet_corpus, control = list(removePunctuation = TRUE,stopwords = stopwords("english"), removeNumbers = TRUE, tolower = TRUE))

#install.packages(c("wordcloud","tm"),repos="http://cran.r-project.org")

library(wordcloud)

m = as.matrix(tdm) #we define tdm as matrix

word_freqs = sort(rowSums(m), decreasing=TRUE) #now we get the word orders in decreasing order

dm = data.frame(word=names(word_freqs), freq=word_freqs) #we create our data set

wordcloud(dm$word, dm$freq, random.order=FALSE, colors=brewer.pal(8, "Dark2")) #and we visualize our data

This will create a word cloud like the following:

Analytics data linked R Social Media

Recent posts

Analyze Face Emotions with R

The Data Mining Process: Modeling

Revolution R Open Performance Improvements

Get The Best R Books For Just $5

Just published: Mastering RStudio [Free Sample]

ThinkToStart is looking for R Authors

Any questions left? Just send me a message!

Email Newsletter

Analyze LinkedIn with R

Get the package

Authenticate with LinkedIn

Analyze LinkedIn with R

Julian Hillebrand

You may also like

Recent posts

Any questions left? Just send me a message!

Email Newsletter