Create a CrateDB Data Source

Follow these steps to connect to CrateDB and create a CrateDB data source:

Configure User Access Rights
Create a CrateDB Data Source

Refer to Additional Information for additional performance tips and information about CrateDB feature support.

You can also use this data source to connect to CrateDB clusters that expose the PostgreSQL wire protocol.

Configure User Access Rights

We recommend creating a dedicated user and user role specifically for integrating with GoodData.

Steps:

Create a user role and grant it the following access rights:

GRANT CONNECT ON DATABASE {database_name} TO ROLE {role_name};
GRANT USAGE ON SCHEMA {schema_name} TO ROLE {role_name};
GRANT SELECT ON ALL TABLES IN SCHEMA {schema_name} TO ROLE {role_name};

Create a user and assign them the user role:

GRANT ROLE {role_name} TO USER {user_name};

Make the user role default for the user:

ALTER USER {user_name} SET DEFAULT_ROLE={role_name};

Create a CrateDB Data Source

Once you have configured your CrateDB user’s access rights, you can proceed to create a CrateDB data source that you can then connect to.

Steps:

On the home page switch to Data sources.
Click Connect data.
Select CrateDB.
Name your data source and fill in your CrateDB credentials and click Connect:
Input your schema name and click Save:
Your data source is created!

Steps:

Create a CrateDB data source with the following API call:

curl $HOST_URL/api/v1/entities/dataSources       -H "Content-Type: application/vnd.gooddata.api+json"       -H "Accept: application/vnd.gooddata.api+json"       -H "Authorization: Bearer $API_TOKEN"       -X POST       -d '{
    "data": {
    "type": "dataSource",
    "id": "<unique_id_for_the_data_source>",
    "attributes": {
      "name": "<data_source_display_name>",
      "url": "jdbc:postgresql://<CRATEDB_HOST>:5432/<CRATEDB_DBNAME>",
      "schema": "<CRATEDB_SCHEMA>",
      "type": "CRATEDB",
      "username": "<CRATEDB_USER>",
      "password": "<CRATEDB_PASSWORD>"
    }
    }
  }' | jq .

To confirm that the data source has been created, ensure the server returns the following response:

{
  "data": {
    "type": "dataSource",
    "id": "<unique_id_for_the_data_source>",
    "attributes": {
      "name": "<data_source_display_name>",
      "url": "jdbc:postgresql://<CRATEDB_HOST>:5432/<CRATEDB_DBNAME>",
      "schema": "<CRATEDB_SCHEMA>",
      "type": "CRATEDB",
      "username": "<CRATEDB_USER>"
    }
  },
  "links": {
    "self": "$HOST_URL/api/v1/entities/dataSources/<unique_id_for_the_data_source>"
  }
}

Create a CrateDB data source with the following API call:

from gooddata_sdk import GoodDataSdk, CatalogDataSource, BasicCredentials

host = "<GOODDATA_URI>"
token = "<API_TOKEN>"
sdk = GoodDataSdk.create(host, token)

sdk.catalog_data_source.create_or_update_data_source(
    CatalogDataSourcePostgres(
        id=data_source_id,
        name=data_source_name,
        db_specific_attributes=PostgresAttributes(
            host=os.environ["CRATEDB_HOST"],
            db_name=os.environ["CRATEDB_DBNAME"]
        ),
        schema=os.environ["CRATEDB_SCHEMA"],
        credentials=BasicCredentials(
            username=os.environ["CRATEDB_USER"],
            password=os.environ["CRATEDB_PASSWORD"],
        ),
    )
)

Additional Information

Ensure you understand the following limitations and recommended practice.

Data Source Details

The JDBC URL must be in the following format:
jdbc:postgresql://<host>:<port>/<databaseName>
Basic authentication is supported. Specify user and password.
GoodData uses up-to-date drivers.

Unsupported Features

GoodData does not support the following functions:

Percent_rank
VAR (only VARP – population variance – is available)
CORREL
COVAR
COVARP
CUME_DIST
INTERCEPT
RSQ

CrateDB supports only approximate percentile / median computation.

Performance Tips

If your database holds a large amount of data, consider the following practices:

Index the columns that are most frequently used for JOIN and aggregation operations. Those columns may be mapped to attributes, labels, primary and foreign keys.
Define partitioning to improve performance of visualizations that use only the recent data.
This feature strongly relies on the version of your CrateDB database, so check the official user documentation for your version.

Query Timeout

The default timeout value for queries is 160 seconds. If a query takes longer than 160 seconds, it is stopped. The user then receives a status code 400 and the message Query timeout occurred.

Query timeout is closely related to the ACK timeout. For proper system configuration, the ACK timeout should be longer than the query timeout. The default ACK timeout value is 170 seconds.

Permitted parameters

adaptiveFetch
adaptiveFetchMaximum
adaptiveFetchMinimum
allowEncodingChanges
ApplicationName
assumeMinServerVersion
autosave
binaryTransferDisable
binaryTransferEnable
cleanupSavepoints
connectTimeout
currentSchema
defaultRowFetchSize
disableColumnSanitiser
escapeSyntaxCallMode
gssEncMode
hostRecheckSeconds
loadBalanceHosts
loginTimeout
logUnclosedConnections
options
preferQueryMode
preparedStatementCacheQueries
preparedStatementCacheSizeMiB
readOnly
reWriteBatchedInserts
socketFactory
socketTimeout
ssl
sslmode
sslpassword
sslpasswordcallback
targetServerType
tcpKeepAlive

Create Alerts With AI Assistant