Create a Greenplum Data Source

Follow these steps to connect to a Greenplum database and create a Greenplum data source:

  1. Configure User Access Rights

  2. Create a Greenplum Data Source

Refer to Additional Information for additional performance tips and information about Greenplum feature support.

Configure User Access Rights

We recommend that you create a dedicated user and user role for integration with the GoodData platform.

Steps:

  1. Create a user role and grant the following access rights to it:

    GRANT CONNECT ON DATABASE {database_name} TO ROLE {role_name};
    GRANT USAGE ON SCHEMA {schema_name} TO ROLE {role_name};
    GRANT SELECT ON ALL TABLES IN SCHEMA {schema_name} TO ROLE {role_name};
    
  2. Create a user and grant it with the user role:

    GRANT ROLE {role_name} TO USER {user_name};
    
  3. Make the user role default for the user:

    ALTER USER {user_name} SET DEFAULT_ROLE={role_name};
    

Create a Greenplum Data Source

Once you have configured your Greenplum user, you can proceed to create a Greenplum data source that you can then connect to.

UI
API
Python

Steps:

  1. On the home page switch to Data sources.

    data sources tab
  2. Click Connect data.

    connect data
  3. Select Greenplum.

    select data source type

    The following dialog opens:

    GoodData Modeler Add Datasource
  4. Fill in your Greenplum credentials and click Connect.

  5. Input your schema name and click Save.

    Your data source is created!

Steps:

  1. Create a Greenplum data source with the following API call:

    curl $HOST_URL/api/v1/entities/dataSources \
      -H "Content-Type: application/vnd.gooddata.api+json" \
      -H "Accept: application/vnd.gooddata.api+json" \
      -H "Authorization: Bearer $API_TOKEN" \
      -X POST \
      -d '{
        "data": {
          "type": "dataSource",
          "id": "<unique_id_for_the_data_source>",
          "attributes": {
            "name": "<data_source_display_name>",
            "url": "jdbc:postgresql://<host>:5432/<database_name>",
            "schema": "<schema_name>",
            "type": "GREENPLUM",
            "username": "<username>",
            "password": "<password>"
          }
        }
      }' | jq .
    
  2. To confirm that the data source has been created, ensure the server returns the following response:

    {
      "data": {
        "type": "dataSource",
        "id": "<unique_id_for_the_data_source>",
        "attributes": {
          "name": "<data_source_display_name>",
          "url": "jdbc:postgresql://<host>:5432/<database_name>",
          "schema": "<schema_name>",
          "type": "GREENPLUM",
          "username": "<username>"
        }
      },
      "links": {
        "self": "$HOST_URL/api/v1/entities/dataSources/<unique_id_for_the_data_source>"
      }
    }
    

Create a PostgreSQL data source with the following API call:

from gooddata_sdk import GoodDataSdk, CatalogDataSource, BasicCredentials

host = "<GOODDATA_URI>"
token = "<API_TOKEN>"
sdk = GoodDataSdk.create(host, token)

sdk.catalog_data_source.create_or_update_data_source(
    CatalogDataSourceGreenplum(
        id=data_source_id,
        name=data_source_name,
        db_specific_attributes=GreenplumAttributes(
            host=os.environ["GREENPLUM_HOST"],
            db_name=os.environ["GREENPLUM_DBNAME"]
        ),
        schema=os.environ["GREENPLUM_SCHEMA"],
        credentials=BasicCredentials(
            username=os.environ["GREENPLUM_USER"],
            password=os.environ["GREENPLUM_PASSWORD"],
        ),
    )
)

Additional Information

Performance Tips

If your database holds a large amount of data, consider optimizing it before connecting the database to GoodData. For tips on how to optimize Greenplum performance, see Optimizing Greenplum Performance article on their website.

Query Timeout

Query timeout is configurable per application instance. It is a parameter of the sql-executor service, default value is 160 seconds.

Query timeout is closely related to the ACK timeout. Proper configuration of the system requires that ACK timeout is longer than query timeout. Default ACK timeout value is 170 seconds.

Supported URL Parameters

  • adaptiveFetch
  • adaptiveFetchMaximum
  • adaptiveFetchMinimum
  • allowEncodingChanges
  • ApplicationName
  • assumeMinServerVersion
  • autosave
  • binaryTransferDisable
  • binaryTransferEnable
  • cleanupSavepoints
  • connectTimeout
  • currentSchema
  • defaultRowFetchSize
  • disableColumnSanitiser
  • escapeSyntaxCallMode
  • gssEncMode
  • hostRecheckSeconds
  • loadBalanceHosts
  • loginTimeout
  • logUnclosedConnections
  • options
  • preferQueryMode
  • preparedStatementCacheQueries
  • preparedStatementCacheSizeMiB
  • readOnly
  • reWriteBatchedInserts
  • socketFactory
  • socketTimeout
  • ssl
  • sslcert
  • sslfactory
  • sslmode
  • sslpassword
  • sslpasswordcallback
  • targetServerType
  • tcpKeepAlive