Create an Apache Drill Data Source

Known limitations

  • Date arithmetic:
    • WEEK granularities are not supported (such as WEEK or DOW)
    • DOY granularity is not supported
    • Not all period-over-period functionality works due to partially missing INTERVAL shifting in Drill
  • Functions:
    • MEDIAN (or any alternative like PERCENTILE_CONT) analytics function is not supported by Drill.
    • GREATEST and LEAST functions treat NULL values incorrectly
    • Some WINDOW frames are not supported
    • SUM in CASE may not work
    • REGR_R2 is not supported
    • When using aggregations with an empty dimensionality and when all values are NULL, report result may be incorrect

Prepare Apache Drill for GoodData

To learn how to register Data Sources to Apache Drill, refer to the official Apache Drill documentation for connecting a Data Source.

For additional considerations, refer to Preparing Data Source Managers for GoodData.

Data Source Details

  • The following considerations apply when you are configuring the JDBC URL:
  • Basic authentication is most likely supported but is untested. You can test authentication by specifying the user and password.
  • You can set enableCaching to true and cachePath to ["dfs", "data"]

You must configure the writable storage plugin so that the path for dfs.data points to the local filesystem. You can find more information in the official Apache Drill documentation for Configuring Storage Plugins.

You can configure the DSM through the web UI, or you can store the configuration into the file storage-plugins-override.conf and mount it as a volume into the container.

The following example is a snippet that demonstrates the configuration settings for the Apache Drill DSM:

"storage": {
  dfs: {
      type: "file",
      connection: "file:///",
      enabled: true,
      workspaces: {
        "tmp": {
          "location": "/tmp",
          "writable": true,
          "defaultInputFormat": null,
          "allowAccessOutsideWorkspace": false
        },
        "root": {
          "location": "/",
          "writable": false,
          "defaultInputFormat": null,
          "allowAccessOutsideWorkspace": false
        },
        "data": {
          "location": "/data",
          "writable": true,
          "defaultInputFormat": null,
          "allowAccessOutsideWorkspace": false
        }
      },
      formats: {
        "parquet": {
          "type": "parquet"
        },
        .... add other formats based on your needs ....
      }
    }
  }
}

Performance Tips

If you want to query large datasets or even join large datasets from different data sources, we recommend you first snapshot the datasets into Apache Drill (CREATE TABLE AS) and then querying the table snapshots.

Query Timeout

Query timeout is not supported for Apache Drill yet.