Using Google BigQuery with R

BigQuery header

Hey everybody,

this tutorial is about combining two great and powerful tools: R and Google BigQuery.

Big Query:

So but what is BigQuery? I think Google describes it very well on the developers page:
“Querying massive datasets can be time consuming and expensive without the right hardware and infrastructure. Google BigQuery solves this problem by enabling super-fast, SQL-like queries against append-only tables, using the processing power of Google’s infrastructure. Simply move your data into BigQuery and let us handle the hard work. You can control access to both the project and your data based on your business needs, such as giving others the ability to view or query your data.” (source:
 And BigQuery is fast. BigQuery is extremely fast but you will see that later when we query some sample data.

Getting Ready

But before we can enjoy the speed we need to do some work. Maybe “work” is the wrong way as using BigQuery is as simple as possible. First we need to create a project for our test in the Google Developers Console

Project BigQuery

You can name  it however you want to and so you can do with the ProjectID. It might shouldn´t be too complicated as we need it later.  In the next step we have to activate the BigQuery API for our Project what we can do when we click on “APIs” on the left-hand site.

API sidebar

Let´s bring it to R

And the R part is as easy as the Google part because the awesome Hadley Wickham wrote a nice and easy package called “bigrquery“.

Start R and type in:

Ok now we installed the necessary packages and can do our first query.
Ok what happens here?

First we set our ProjectID we chose when we created the project in the Google Developer Console.

In the next step we set our SQL query which we execute in the last row and save the result in the variable “data”.

In this case I query a sample dataset containing a list of wikipedia entries.

This dataset has 313,797,035 rows and a total size 35,7 GB! And you can query it in under 2 seconds!

This is a good example for showing the speed of BigQuery. We want the fields “title, contributor_username, comment” from every row in the dataset where the title contains “beer”. We then limit the output to 100 entries because otherwise in this example we would receive something about 360,00 entries with the word beer in the title (a huge amount).

A tab in browser will then open where you have to login into your Google account and give your R instance the permission to access your project. After the authentication you can see the result of your query in R.

data table

BigQuery speed

Julian Hillebrand

During my time at university and learning about the basics of economics I started heavily exploring the possibilities and changes caused by digital disruptions and the process of digital transformation, whereby I focused on the importance of data and data analytics and combination with marketing and management.
My personal focus of interest lies heavily on technology, digital marketing and data analytics. I made early acquaintance with programming and digital technology and never stop being interested in following the newest innovations.

I am an open, communicative and curious person. I enjoy writing, blogging and speaking about technology.

  • Excellent and helpful post. Thanks very much.

  • Great one. Thank you.

  • Miguel

    Based on the post, it looks like we can only do queries involving extraction of data, what about creating a temp table and subset and query that subset using R and applying a model on that?

  • Pingback: Updates to Google BigQuery following Cloud Platform Live « « EsheleDEsheleD()

  • Pingback: Updates to Google BigQuery following Cloud Platform Live » McCloud()

  • nice post.

  • minh

    Hello, Could you please help me. Could you please point out what I did wrong because I did similar thing to your original code but id did not work.
    My code is : devtools::install_github(“bigrquery”)
    library (bigrquery)


    project <- "still-lamp-663" # put your project ID here

    Please let me know where do I need to include the OAuth cerdentials in the code.

    Many thanks!!

    • Hey,
      the package should ask for your credentials.
      What does the error look like?


  • Wesley Bernegger

    Hi. I keep getting an “invalid credentials” error message when I run query_exec.
    The package doesn’t seem to be asking for my credentials, either.
    Any other approaches I could try? I wasn’t sure if it might have been an issue with RStudio so I tried in R console, but got the same result.
    I’ve also tried authorization through the ‘httr’ package and am still getting “invalid credentials.”
    Any help would be appreciated. Thanks!

  • divyang

    Hi, I am in trouble due to silly mistake. can you help me?

    when A tab in browser opens where you have to login into your Google account and give your R instance the permission to access your project at that time by mistake I have given wrong credentials and after that when I am trying to access it gives the error : Invalid Credentials every time I try to use query_exec(query,project) command.

    so, how can I solve it/ enter write credentials.

    please, help me.

  • OcTaFl3x

    Hi all !

    Nice post !

    I’m experiencing some sort of issues dealing with a dataset containing approx. 14M rows. It apparently seems like it takes quite a big amount of time for downloading (I stopped it after 10 minutes at the data retrieval step). Though, the query execution works just fine and fast (17s). 🙂 Do you have any feedback about processing this amount of data ? (9 columns composed of 5 STRING columns with one containing URLs) My dataset does not look that big so I was wondering if you had any feedback about processing similar data sizes ? If downloading this datset manually takes less time (export to Google Storage and then download) then I have no interest into using bigrquery library :-S

    My R statement: my_bqdata <- = project, query = query, destination_table = "sandbox.octavio_R", max_pages = Inf, page_size = 1000000))



  • Rizwan Bashir

    Hi All,

    I am inserting a data frame from R to BigQuery using insert_upload_job().
    The data gets inserted into BigQuery but the rows get swapped for some reason.

    I have tried it on small as well as big datasets but had the same issue with both datasets.
    There is no “row swapping” pattern I could identify, it is pretty random.