Gender Analysis of Facebook Post Likes

Facebook Page Analyzer
A lot of people showed a huge interest in analyzing Facebook data with R. So I decided to write some more tutorials about the possibilities you have with Rfacebook package created by Pablo Barbera.
This tutorial will be about plotting the gender distribution of the likes of Facebook page posts. The Rfacebook package does not include a direct function for this problem, but with the combination of a few different functions it is possible.
If you just want to try the function take a look at the first BETA of my Facebook Page Analyzer tool which includes the method described in this tutorial:


Like every time we need to go through the authentication process. You can find the steps to do so in the first part of this tutorial:

Gender Analysis of Facebook Post Likes

First we have to load the Rfacebook package
When you follow the steps in the tutorial I mentioned above you got your authentication token.
Then we have to define the number of posts of the page we want to analyze. These are always the most recent ones. But they can have a lot of likes and as we have to process different actions on every single like it can last a lot of time if you choose the number of posts too big.
The last variable we have to define is, of course, the name of the page we want to analyze.
In the next step we download the comments from the page with:
This returns a data frame with the number of posts we requested if it is available. The posts have following attributes:
from_id, from_name, message, created_time, type, link, id, likes_count, comments_count, shares_count
For our analysis we just need the column id which contains a unique identifier for every post, also called the post id.

Get Post Like details

In the next steps there are happening basically two processes. First we create a new entry in our final data frame from the post we are analyzing at the moment. And then we use its id to get more insights to this post with the getPost() function.
The returned data frame basically contains 3 values: post, likes, comments.
These categories contain several lists with even more data, but we just need the data stored in the „likes“ section. There we can find the fields from_name and from_id for every single like of the post.
So we extract the user_id which is the field from_id and get the user insights with the getUsers() function. From the returning user data we extract the gender and save it to a temporary gender_frame.
After we processed all likes of the post and stored the gender of every single like in the gender_frame we divide it in 3 categories: male, female and etc. So we count how many people said they are „male“, „female“ or something different.
We then save the results in our data_frame_gender and process the next posts in the same way.

Plot the data

The plotting can be done really fast.
We define the slices of our pie chart and add the names to them.
You can find the whole code on my github account: 

Julian Hillebrand

During my time at university and learning about the basics of economics I started heavily exploring the possibilities and changes caused by digital disruptions and the process of digital transformation, whereby I focused on the importance of data and data analytics and combination with marketing and management.
My personal focus of interest lies heavily on technology, digital marketing and data analytics. I made early acquaintance with programming and digital technology and never stop being interested in following the newest innovations.

I am an open, communicative and curious person. I enjoy writing, blogging and speaking about technology.

  • Hi Julien,

    First, congratulate for easy and useful recipe. Today, I’ve no time to debug, but there’s a bug in your code.
    In my R-Studio I’ve an error with that there’s no “data_frame_gender”, so I add a simply solution:
    data_frame_gender <- NULL
    But there's another error in creating a data frame row with value from "post$post$message" (first for loop)
    Maybe there's a simply misteake with "posts" & "post" variables?

    • Hey,
      thanks for your comment.
      I just mixed up some lines from the shiny version of the code.
      I added the correct definition of the data_frame_gender and changed the variable names in the plot section.
      Now it should work.


  • Srini

    Hi Julian, great work. Very helpful and greatly appreciate your work. When I ran the code in R studio, the program is returning 0 observations for data_frame_gender. I think we should have a check that the graph will get triggered only if the male/female value is greater than 0. Because my current value is zero, graph portion hangs .

    still trying to figure out why data_frame_gender is zero.. Did you make the corrections in github or over here?


    • Hey
      I totally agree with the check but I also don´t know why it returns 0 values.
      If you can find a solution please send a push on github.