Create a CrateDB Data Source

Follow these steps to connect to CrateDB and create a CrateDB data source:

  1. Configure User Access Rights

  2. Create a CrateDB Data Source

Refer to Additional Information for additional performance tips and information about CrateDB feature support.

You can also use this data source to connect to CrateDB clusters that expose the PostgreSQL wire protocol.

Configure User Access Rights

We recommend creating a dedicated user and user role specifically for integrating with GoodData.

Steps:

  1. Create a user role and grant it the following access rights:

    GRANT CONNECT ON DATABASE {database_name} TO ROLE {role_name};
    GRANT USAGE ON SCHEMA {schema_name} TO ROLE {role_name};
    GRANT SELECT ON ALL TABLES IN SCHEMA {schema_name} TO ROLE {role_name};
  2. Create a user and assign them the user role:

    GRANT ROLE {role_name} TO USER {user_name};
  3. Make the user role default for the user:

    ALTER USER {user_name} SET DEFAULT_ROLE={role_name};

Create a CrateDB Data Source

Once you have configured your CrateDB user’s access rights, you can proceed to create a CrateDB data source that you can then connect to.

Steps:

  1. On the home page switch to Data sources.

    The left navigation panel with the Data sources tab highlighted.
  2. Click Connect data.

    The Connect data button highlighted in the top-right corner of the Data sources screen.
  3. Select CrateDB.

    Dialog showing available data source types with the CrateDB option highlighted.
  4. Name your data source and fill in your CrateDB credentials and click Connect:

    Form to enter credentials for a CrateDB data source. Fields include the Data Source Name, Connection URL, a SSL Mode selector, Username, Password, and Database Name.
  5. Input your schema name and click Save:

    The second screen of the Data Source Credentials dialog, showing a single field to specify the schema that determines which data is accessible in GoodData.

    Your data source is created!

Steps:

  1. Create a CrateDB data source with the following API call:

    curl $HOST_URL/api/v1/entities/dataSources       -H "Content-Type: application/vnd.gooddata.api+json"       -H "Accept: application/vnd.gooddata.api+json"       -H "Authorization: Bearer $API_TOKEN"       -X POST       -d '{
        "data": {
        "type": "dataSource",
        "id": "<unique_id_for_the_data_source>",
        "attributes": {
          "name": "<data_source_display_name>",
          "url": "jdbc:postgresql://<CRATEDB_HOST>:5432/<CRATEDB_DBNAME>",
          "schema": "<CRATEDB_SCHEMA>",
          "type": "CRATEDB",
          "username": "<CRATEDB_USER>",
          "password": "<CRATEDB_PASSWORD>"
        }
        }
      }' | jq .
  2. To confirm that the data source has been created, ensure the server returns the following response:

    {
      "data": {
        "type": "dataSource",
        "id": "<unique_id_for_the_data_source>",
        "attributes": {
          "name": "<data_source_display_name>",
          "url": "jdbc:postgresql://<CRATEDB_HOST>:5432/<CRATEDB_DBNAME>",
          "schema": "<CRATEDB_SCHEMA>",
          "type": "CRATEDB",
          "username": "<CRATEDB_USER>"
        }
      },
      "links": {
        "self": "$HOST_URL/api/v1/entities/dataSources/<unique_id_for_the_data_source>"
      }
    }

Create a CrateDB data source with the following API call:

from gooddata_sdk import GoodDataSdk, CatalogDataSource, BasicCredentials

host = "<GOODDATA_URI>"
token = "<API_TOKEN>"
sdk = GoodDataSdk.create(host, token)

sdk.catalog_data_source.create_or_update_data_source(
    CatalogDataSourcePostgres(
        id=data_source_id,
        name=data_source_name,
        db_specific_attributes=PostgresAttributes(
            host=os.environ["CRATEDB_HOST"],
            db_name=os.environ["CRATEDB_DBNAME"]
        ),
        schema=os.environ["CRATEDB_SCHEMA"],
        credentials=BasicCredentials(
            username=os.environ["CRATEDB_USER"],
            password=os.environ["CRATEDB_PASSWORD"],
        ),
    )
)

Additional Information

Ensure you understand the following limitations and recommended practice.

Data Source Details

  • The JDBC URL must be in the following format:

    jdbc:postgresql://<host>:<port>/<databaseName>

  • Basic authentication is supported. Specify user and password.

  • GoodData uses up-to-date drivers.

Unsupported Features

GoodData does not support the following functions:

  • Percent_rank
  • VAR (only VARP – population variance – is available)
  • CORREL
  • COVAR
  • COVARP
  • CUME_DIST
  • INTERCEPT
  • RSQ

CrateDB supports only approximate percentile / median computation.

Performance Tips

If your database holds a large amount of data, consider the following practices:

  • Index the columns that are most frequently used for JOIN and aggregation operations. Those columns may be mapped to attributes, labels, primary and foreign keys.

  • Define partitioning to improve performance of visualizations that use only the recent data.

    This feature strongly relies on the version of your CrateDB database, so check the official user documentation for your version.

Query Timeout

The default timeout value for queries is 160 seconds. If a query takes longer than 160 seconds, it is stopped. The user then receives a status code 400 and the message Query timeout occurred.

Query timeout is closely related to the ACK timeout. For proper system configuration, the ACK timeout should be longer than the query timeout. The default ACK timeout value is 170 seconds.

Permitted parameters

  • adaptiveFetch
  • adaptiveFetchMaximum
  • adaptiveFetchMinimum
  • allowEncodingChanges
  • ApplicationName
  • assumeMinServerVersion
  • autosave
  • binaryTransferDisable
  • binaryTransferEnable
  • cleanupSavepoints
  • connectTimeout
  • currentSchema
  • defaultRowFetchSize
  • disableColumnSanitiser
  • escapeSyntaxCallMode
  • gssEncMode
  • hostRecheckSeconds
  • loadBalanceHosts
  • loginTimeout
  • logUnclosedConnections
  • options
  • preferQueryMode
  • preparedStatementCacheQueries
  • preparedStatementCacheSizeMiB
  • readOnly
  • reWriteBatchedInserts
  • socketFactory
  • socketTimeout
  • ssl
  • sslmode
  • sslpassword
  • sslpasswordcallback
  • targetServerType
  • tcpKeepAlive