Create a CrateDB Data Source
Follow these steps to connect to CrateDB and create a CrateDB data source:
Refer to Additional Information for additional performance tips and information about CrateDB feature support.
You can also use this data source to connect to CrateDB clusters that expose the PostgreSQL wire protocol.
Configure User Access Rights
We recommend creating a dedicated user and user role specifically for integrating with GoodData.
Steps:
Create a user role and grant it the following access rights:
GRANT CONNECT ON DATABASE {database_name} TO ROLE {role_name}; GRANT USAGE ON SCHEMA {schema_name} TO ROLE {role_name}; GRANT SELECT ON ALL TABLES IN SCHEMA {schema_name} TO ROLE {role_name};Create a user and assign them the user role:
GRANT ROLE {role_name} TO USER {user_name};Make the user role default for the user:
ALTER USER {user_name} SET DEFAULT_ROLE={role_name};
Create a CrateDB Data Source
Once you have configured your CrateDB user’s access rights, you can proceed to create a CrateDB data source that you can then connect to.
Steps:
On the home page switch to Data sources.
Click Connect data.
Select CrateDB.
Name your data source and fill in your CrateDB credentials and click Connect:
Input your schema name and click Save:
Your data source is created!
Steps:
Create a CrateDB data source with the following API call:
curl $HOST_URL/api/v1/entities/dataSources -H "Content-Type: application/vnd.gooddata.api+json" -H "Accept: application/vnd.gooddata.api+json" -H "Authorization: Bearer $API_TOKEN" -X POST -d '{ "data": { "type": "dataSource", "id": "<unique_id_for_the_data_source>", "attributes": { "name": "<data_source_display_name>", "url": "jdbc:postgresql://<CRATEDB_HOST>:5432/<CRATEDB_DBNAME>", "schema": "<CRATEDB_SCHEMA>", "type": "CRATEDB", "username": "<CRATEDB_USER>", "password": "<CRATEDB_PASSWORD>" } } }' | jq .To confirm that the data source has been created, ensure the server returns the following response:
{ "data": { "type": "dataSource", "id": "<unique_id_for_the_data_source>", "attributes": { "name": "<data_source_display_name>", "url": "jdbc:postgresql://<CRATEDB_HOST>:5432/<CRATEDB_DBNAME>", "schema": "<CRATEDB_SCHEMA>", "type": "CRATEDB", "username": "<CRATEDB_USER>" } }, "links": { "self": "$HOST_URL/api/v1/entities/dataSources/<unique_id_for_the_data_source>" } }
Create a CrateDB data source with the following API call:
from gooddata_sdk import GoodDataSdk, CatalogDataSource, BasicCredentials
host = "<GOODDATA_URI>"
token = "<API_TOKEN>"
sdk = GoodDataSdk.create(host, token)
sdk.catalog_data_source.create_or_update_data_source(
CatalogDataSourcePostgres(
id=data_source_id,
name=data_source_name,
db_specific_attributes=PostgresAttributes(
host=os.environ["CRATEDB_HOST"],
db_name=os.environ["CRATEDB_DBNAME"]
),
schema=os.environ["CRATEDB_SCHEMA"],
credentials=BasicCredentials(
username=os.environ["CRATEDB_USER"],
password=os.environ["CRATEDB_PASSWORD"],
),
)
)Additional Information
Ensure you understand the following limitations and recommended practice.
Data Source Details
The JDBC URL must be in the following format:
jdbc:postgresql://<host>:<port>/<databaseName>Basic authentication is supported. Specify
userandpassword.GoodData uses up-to-date drivers.
Unsupported Features
GoodData does not support the following functions:
Percent_rankVAR(onlyVARP– population variance – is available)CORRELCOVARCOVARPCUME_DISTINTERCEPTRSQ
CrateDB supports only approximate percentile / median computation.
Performance Tips
If your database holds a large amount of data, consider the following practices:
Index the columns that are most frequently used for JOIN and aggregation operations. Those columns may be mapped to attributes, labels, primary and foreign keys.
Define partitioning to improve performance of visualizations that use only the recent data.
This feature strongly relies on the version of your CrateDB database, so check the official user documentation for your version.
Query Timeout
The default timeout value for queries is 160 seconds. If a query takes longer than 160 seconds, it is stopped. The user then receives a status code 400 and the message Query timeout occurred.
Query timeout is closely related to the ACK timeout. For proper system configuration, the ACK timeout should be longer than the query timeout. The default ACK timeout value is 170 seconds.
Permitted parameters
These are advanced JDBC driver properties. In most cases, GoodData UI does not expose or validate them, so use them with caution through the data source API when creating, updating, or testing a connection.
Keep data.attributes.url limited to the base JDBC URL and specify advanced properties in data.attributes.parameters. A parameter being listed here means the JDBC driver accepts it, but it does not guarantee that the related capability is fully supported by GoodData and some values may still cause unexpected errors. Some authentication-related parameters may be accepted by the JDBC driver, even if GoodData may not fully support it.
Example
"attributes": {
"name": "<data_source_display_name>",
"url": "jdbc:postgresql://<CRATEDB_HOST>:5432/<CRATEDB_DBNAME>",
"schema": "<CRATEDB_SCHEMA>",
"type": "CRATEDB",
"username": "<CRATEDB_USER>",
"password": "<CRATEDB_PASSWORD>",
"parameters": {
"sslmode": "require",
"socketTimeout": "30"
}
}The supported parameters for CrateDB data sources are:
- adaptiveFetch
- adaptiveFetchMaximum
- adaptiveFetchMinimum
- allowEncodingChanges
- ApplicationName
- assumeMinServerVersion
- autosave
- binaryTransferDisable
- binaryTransferEnable
- cleanupSavepoints
- connectTimeout
- currentSchema
- defaultRowFetchSize
- disableColumnSanitiser
- escapeSyntaxCallMode
- gssEncMode
- hostRecheckSeconds
- loadBalanceHosts
- loginTimeout
- logUnclosedConnections
- options
- preferQueryMode
- preparedStatementCacheQueries
- preparedStatementCacheSizeMiB
- readOnly
- reWriteBatchedInserts
- socketFactory
- socketTimeout
- ssl
- sslmode
- sslpassword
- sslpasswordcallback
- targetServerType
- tcpKeepAlive




