Google BigQuery

Prerequisite

Please note that Google BigQuery is supported only in GoodData.CN. Before you can use BigQuery with GoodData.CN, you must install the JDBC driver into GoodData.CN. Due to licensing restrictions, GoodData is not allowed to distribute the JDBC driver for BigQuery.

You can download the driver from the Google BigQuery website. Alternatively, you can download the same driver from the Simba website for BigQuery (registration required).

To install the driver, follow the procedure for injecting custom JDBC drivers that corresponds to your edition of GoodData.CN:

Data Source Details

Use the following information when creating a data source for your BigQuery database:

  • The JDBC URL must be in the following format:
    • jdbc:bigquery://https://www.googleapis.com/bigquery/v2:443;ProjectId=<Your Project ID>;OAuthType=0;Timeout=150
    • BigQuery JDBC driver cancels queries automatically after 10 seconds if Timeout parameter is omitted
  • Schema
    • Dataset name
  • Token
    • Only service account authentication is supported
    • Encode service account JSON by Base64
      Bash
      PowerShell 7
      Mac
      base64 -w0 service_account.json
      
      $File = "c:\secrets\service_account.json"
      $Content1 = get-content $File
      $Bytes = [System.Text.Encoding]::UTF8.GetBytes($Content1)
      $Encoded = [System.Convert]::ToBase64String($Bytes)
      Write-Host "ENCODED: " $Encoded
      
      base64 service_account.json
      
  • Username & password
    • Leave it empty. Token(service account) is used for authentication

Unsupported Features

GoodData.CN does not support the following features:

  • The PERCENTILE function.
  • The REGR_R2 function.
  • Referential integrity:
    • BigQuery does not support referential integrity (primary and foreign keys).
    • Primary and foreign keys cannot be utilized when generating a logical data model (LDM) from a physical data model (PDM).

User Access Rights

To connect your BigQuery data warehouse and GoodData, we recommend that you use Google Cloud Platform service account. For sufficient level of access, ensure your service account has the following user roles and permissions:

Steps:

  1. Grant your service account the following user roles:

    • bigquery.dataViewer
    • bigquery.jobUser

    or if you have caching enabled:

    • bigquery.dataEditor
    • bigquery.jobUser


    For more information see Google Cloud documentation Service Accounts 🡕 and Access Control 🡕.

  2. Ensure your service account has the following permissions:

    • bigquery.jobs.create
    • bigquery.tables.get
    • bigquery.tables.getData
    • bigquery.tables.list

Performance Tips

If your database holds a large amount of data, consider the following practices:

  • Denormalize the relational data model of your database.
    • This helps avoid large JOIN operations. Because BigQuery is a columnar database, queries read only the required columns and each column is compressed separately.
  • Utilize clustered tables.
    • Data can be pruned when using clustered columns.
  • Utilize partitioned tables.
    • To map the BigQuery provided pseudo-columns _PARTITIONDATE and _PARTITIONTIME onto the LDM:
      • Reference your date dataset in the dataset mapped to the partitioned table.
      • Map the foreign key representing the reference to your date dataset to one of the pseudo-columns mentioned above.
  • Utilize materialized views.
    • Changes to underlying tables are propagated to related materialized views. They are always consistent.
    • Materialize the results of JOINs and aggregations. These are executed very often as a result of using dashboards or visualizations.
      • Map materialized views and their columns to datasets and the LDM. You can utilize them in metrics, visualizations, and dashboards.

Query Timeout

Query timeout is not supported for Google BigQuery yet.

Permitted parameters

  • ProjectId
  • InitiateOAuth