Analyzing the US elections with Facebook and R

Hillary Clinton and Donald Trump seem to be the nominees for the upcoming US presidential election in November 2016. The US presidential election in five months provokes already harsh attacks between the nominees – the election campaigns of Trump und Clinton evolve to one of the toughest competitions in the recent history of US presidential elections – primary via their social media channels.

To derive a more detailed understanding of the nominees’ appearance and fan base in social media channels, we analyzed the Facebook traffic of Clinton (approximately 3.6m Facebook fans) and Trump (approximately 8m Facebook fans) for over 12 months. Facebook is the leading social media websites in the US in 2016 (Experian, 2016) and may provide relevant and representative information about the nominees’ supporters and fans.

Analyzing the US elections with the CTSA index:

  1. The first figure shows the emotional atmosphere within the nominees’ communities. Specifically, negative expressed emotions (relative to positive expressed emotions) varies considerably among Clinton and Trump. Whereas Trump’s community tends to be more positive engaged (positive comments: 69.46%; negative comments: 30.46%), Clinton’s community is relative more negative engaged (positive comments: 65.94%; negative comments: 33.94%). However, the dispersion of emotions (controversy of comments), as shown in Figure 2, is much more notable in Trump’s community (coefficient of variation, as percent: Trump: 122.23 / Clinton: 115.31).
Figure 1: Negative Emotions
Figure 2: Discrepancy
  1. The third figure shows how optimistic the followers express themselves. The data show that Clinton’s fan base is slightly more optimistic (31.13%), than Trump’s fans base (29.57%), whereby the trend shows, that there is a more positive development in Trump’s community. Again, the dispersion of optimistic comments is much more notable in Trump’s community (coefficient of variation, as percent: Trump: 130.12 / Clinton: 126.11).
Figure 3: Optimism
  1. Finally, the fourth figure shows the communities degree of self-centered style of expression, that is, if the communities write more in a more ‘inclusive’ (e.g., ‘we’) or ‘exclusive’ (e.g., ‘I’) style. The style of expression shows significant differences between Clinton and Trump. Trump’s fans express themselves in a more ‘exclusive’ style (36.12%), whereas Clinton’s fans express themselves in a more ‘inclusive’ style (30.28%).
Figure 4: Self-centered Style of Expression

Data: We crawled the nominees’ public page Facebook data, starting May 01, 2015 until May 31, 2016 via R ‘Rfacebook’. Specifically, we request all posts and corresponding comments for the entire time period (Clinton: approx. 1.2m comments / Trump: approx. 1.4m comments). Following this, each comment was analyzed separately with respect to emotional and psychological constructs (the categories are based on the LIWC dictionary) with R ‘tm’ and ‘quanteda’. The analysis does not include non-English content. Finally, we aggregate the data (comments) on a daily basis.

The data files are attached (for interactive graphics, open the txt files via Internet browser).

Figure1 Figure2 Figure3 Figure4

Here is a stylized example of the basic code (the code is limited to one candidate (Hillary Clinton), one day (2016-07-07), and refers to a public available dictionary (positive/negative word). The original analysis is based on the LIWC dictionary.

Analyzing the US elections with Facebook and R


Daniel Boller

To keep it simple: R makes the world intelligible. Data - as part and results of human interaction - can not be used without intelligent systems and inferential/statistical tools - thus, R provides an optimal framework to get use of the data - or, to understand human interaction. I worked with R for several years and like it. I studied Economics with a strong focus on empirical research methods and behavioral economics. Currently, I work as Research Associate (Doctoral Candidate) at the University of St. Gallen.

  • Matt Cooper

    Great post, even if there isn’t too much separation overall. Any chance you could comment some of the code? Maybe just the non standard functions (corpus, dfm etc)? Cheers.

  • Dennis Ng

    (Get pass the Facebook part and read no. of posts. Hence, the Facebook side is working ok)

    Problem 1: corpus issue

    I got a few messages (see remark1) and there is one particular issue that puzzle me after that.

    > corpus # Apply function to download comments

    > files

    • Markus

      As for “problem 3”, I have fixed it by installing the quanteda library.

      After that, I am stuck with the dfm function (that belongs to the quanteda library)

      > fb_liwc <-dfm(corpus, dictionary=myDict)
      Error in UseMethod("dfm") : no applicable method for 'dfm' applied to an object of
      class "function"

  • Debdutta Roy

    i got this error at the following step. before that everything ran well

    > fb_liwc <-dfm(corpus, dictionary=myDict)
    Creating a dfm from a corpus …
    … indexing 4,018 documents
    … tokenizing texts, found 234,888 total tokens
    … cleaning the tokens, 0 removed entirely
    … applying a dictionary consisting of 2 key entries
    Error in grep(paste(tolower(dictionary[[i]]), collapse = "|"), alltokens$features) :
    invalid regular expression '^2-faced$|^2-faces$|^abnormal$|^abolish$|^abominable$|^abominably$|^abominate$|^abomination$|^abort$|^aborted$|^aborts$|^abrade$|^abrasive$|^abrupt$|^abruptly$|^abscond$|^absence$|^absent-minded$|^absentee$|^absurd$|^absurdity$|^absurdly$|^absurdness$|^abuse$|^abused$|^abuses$|^abusive$|^abysmal$|^abysmally$|^abyss$|^accidental$|^accost$|^accursed$|^accusation$|^accusations$|^accuse$|^accuses$|^accusing$|^accusingly$|^acerbate$|^acerbic$|^acerbically$|^ache$|^ached$|^aches$|^achey$|^aching$|^acrid$|^acridly$|^acridness$|^acrimonious$|^acrimoniously$|^acrimony$|^adamant$|^adamantly$|^addict$|^addicted$|^addicting$|^addicts$|^admonish$|^admonisher$|^admonishingly$|^admonishment$|^admonition$|^adulterate$|^adulterated$|^adulteration$|^adulterier$|^adversarial$|^adversary$|^adverse$|^adversity$|^afflict$|^affliction$|^afflictive$|^affront$|^afraid$|^aggravate$|^aggravating$|^aggravation$|^aggression$|^aggressive$|^aggressiveness$|^aggressor$|^aggrieve$



    Curious — the R code appears to abruptly end with the addition of a column to R data.frame
    # Combine Analysis Data and Original Data

    Is this complete…or AM i missing something here…


  • keyshov b

    Upon executing, “fb_liwc <- dfm(corpus, dictionary=myDict)", I'm getting this weird error,

    Error: could not find function "dfm"

    I've tried using below just before the above line, did not work, is it deprecated/migrated?


    Even reinstalled using,


    which wasn't successful as it returned another error as,

    Error: Could not find build tools necessary to build quanteda

    Please help through 🙁