Create an Amazon Athena Data Source

Follow these steps to connect to Amazon Athena and create an Amazon Athena data source:

Configure User Access Rights
Create an Amazon Athena Data Source

Refer to Additional Information for additional performance tips and information about Amazon Athena feature support.

Configure User Access Rights

We recommend creating a dedicated IAM role or user specifically for integrating with GoodData.

Amazon Athena does not manage database users in the same way as traditional databases. Access is controlled entirely through AWS Identity and Access Management (IAM) and permissions to underlying data stored in Amazon S3 and metadata stored in AWS Glue.

Athena permissions

At minimum, the IAM principal must be able to execute queries in the Athena workgroup used by the connection.

Required actions:

athena:StartQueryExecution
athena:GetQueryExecution
athena:GetQueryResults

These permissions should be granted for the specific Athena workgroup the queries run in. Additional Athena permissions may be required for metadata discovery (for example, listing databases or tables), depending on your configuration.

Amazon S3 permissions

Athena queries data directly from S3 and writes query results back to S3.

Required permissions:

For S3 buckets containing queried data:

s3:ListBucket
s3:GetObject

For the S3 bucket used for Athena query results:

s3:PutObject
s3:GetObject
(Often also s3:ListBucket, depending on bucket layout and prefixes)

AWS Glue Data Catalog permissions

Athena uses AWS Glue as its metadata catalog.

Glue permissions are often the hardest to configure correctly. Permissions typically must be granted at all levels of the catalog hierarchy:

Catalog
Database
Table

Granting access to a table alone may not be sufficient if the database or catalog permissions are missing.

Refer to the official AWS documentation for example IAM policies and guidance:

Create an Amazon Athena Data Source

Once you have configured your Amazon Athena access rights, you can proceed to create an Amazon Athena data source that you can then connect to.

Steps:

On the home page switch to Data sources.
Click Connect data.
Select Amazon Athena.
Name your data source and fill in your Amazon Athena connection details and click Connect.

Your data source is created!

Steps:

Create an Amazon Athena data source with the following API call:

curl $HOST_URL/api/v1/entities/dataSources \
  -H "Content-Type: application/vnd.gooddata.api+json" \
  -H "Accept: application/vnd.gooddata.api+json" \
  -H "Authorization: Bearer $API_TOKEN" \
  -X POST \
  -d '{
    "data": {
      "type": "dataSource",
      "id": "<unique_id_for_the_data_source>",
      "attributes": {
        "name": "<data_source_display_name>",
        "url": "jdbc:awsathena://AwsRegion=<AWS_REGION>",
        "schema": "<ATHENA_DATABASE>",
        "type": "ATHENA"
      }
    }
  }' | jq .

from gooddata_sdk import GoodDataSdk

host = "<GOODDATA_URI>"
token = "<API_TOKEN>"
sdk = GoodDataSdk.create(host, token)

sdk.catalog_data_source.create_or_update_data_source(
    CatalogDataSourceAthena(
        id=data_source_id,
        name=data_source_name,
        schema=os.environ["ATHENA_DATABASE"],
        db_specific_attributes=AthenaAttributes(
            region=os.environ["AWS_REGION"],
            s3_output_location=os.environ["ATHENA_S3_OUTPUT"],
        ),
    )
)

Additional Information

Data Source Details

JDBC URL format: jdbc:awsathena://AwsRegion=<region>
Authentication is IAM-based.
If GoodData runs on AWS with an attached IAM role, no static credentials are required.
If GoodData runs outside AWS, IAM credentials or role assumption must be configured according to your deployment.

Unsupported Features

Statistical functions: regr_slope, regr_intercept, covar_samp, corr, regr_r2
Statistical running functions: stdev, stdevp, var, varp
Window functions with unbounded beginning of frames
Showing missing values in visualizations

Known Issues and Limitations

Weekly same period year comparisons are not supported.
The Show all values feature is limited to a maximum of 50,000 rows.
Amazon Athena may normalize string comparisons depending on data format and SerDe configuration.

Performance Tips

Use columnar formats such as Parquet or ORC.
Partition data in S3 by commonly filtered columns.
Select only required columns.
Avoid complex joins on large unpartitioned datasets.

Query Timeout

GoodData applies a default query timeout of 160 seconds for Amazon Athena queries.

Amazon Redshift

Azure SQL