MotherDuck Data Source

Private Beta Feature

Interested in trying this feature? The functionality detailed in this article is in private beta testing. To request exclusive access, please sign up for the waitlist.

As part of our continuous effort to integrate with new database platforms, we present a guide on setting up a MotherDuck data source in GoodData. MotherDuck extends the capabilities of DuckDB by offering it as a cloud-based service, enhancing the efficiency and scope of data operations. This article is aimed at users who want to utilize DuckDB’s analytical strengths within the GoodData ecosystem.

Why MotherDuck?

DuckDB lays the foundation for MotherDuck with its powerful features:

Ease of Use: DuckDB is designed for accessibility, offering seamless integration with major programming languages and the ability to run directly in web browsers via WebAssembly (WASM).
Unmatched Speed: With its columnar, vectorized execution engine, DuckDB ensures exceptional performance, handling data operations with remarkable speed.
Rich Feature Set: DuckDB stands out for its extensive SQL support and unique integrations. It even introduces innovative features like “GROUP BY ALL,” which revolutionize data aggregation and analysis.
Open Source Flexibility: As an open-source platform, DuckDB is highly adaptable, allowing for extensive customization and extension to meet diverse needs.

Building on DuckDB’s impressive framework, MotherDuck introduces additional capabilities that cater to modern data management requirements:

Serverless Cloud Deployment: MotherDuck simplifies deployment by offering a serverless solution in the cloud, reducing overhead and enhancing scalability.
Standardized Connectivity: Users can easily connect to MotherDuck using widely-accepted protocols such as JDBC and ODBC, ensuring compatibility and ease of integration.
Optimized SQL Executions: With MotherDuck, SQL queries are automatically optimized for efficiency, enabling complex operations, like running SQL queries on CSVs distributed across multiple regions, with ease.
Enhanced Data and Cache Storage: MotherDuck optimizes both data storage and cache management, ensuring high performance and reliability.
Innovative Hybrid Execution: A standout feature of MotherDuck is its hybrid execution capability, which allows for the integration of data stored both client-side and server-side, enabling more versatile and powerful data processing strategies.

Create a MotherDuck Data Source

At the moment we only support creating a MotherDuck data source using the API.

Steps:

If you have not done so already, create a MotherDuck service token.

Encode your service token to Base64:

# Linux
echo -n $MOTHERDUCK_TOKEN | base64 -w 512
# MacOS
echo -n $MOTHERDUCK_TOKEN | base64

Create a MotherDuck data source with the following API call:

curl $HOST_URL/api/v1/entities/dataSources \
  -H "Content-Type: application/vnd.gooddata.api+json" \
  -H "Accept: application/vnd.gooddata.api+json" \
  -H "Authorization: Bearer $API_TOKEN" \
  -X POST \
  -d '{
    "data": {
      "type": "dataSource",
      "id": "<unique_id_for_the_data_source>",
      "attributes": {
        "name": "<data_source_display_name>",
        "type": "DUCKDB",
        "url": "<MOTHERDUCK_JDBC_URL>",
        "token": "<ENCODED_MOTHERDUCK_TOKEN>",
        "username": "",
        "schema": "<MOTHERDUCK_SCHEMA>",
        "enableCaching": false
      }
    }
  }' | jq .

where <MOTHERDUCK_JDBC_URL> may look like this: jdbc:duckdb:md:my_db_name.

To confirm that the data source has been created, ensure the server returns the following response:

{
  "data": {
    "attributes": {
      "name": "<data_source_display_name>",
      "type": "DUCKDB",
      "url": "<MOTHERDUCK_JDBC_URL>",
      "username": "",
      "schema": "<MOTHERDUCK_SCHEMA>",
      "enableCaching": false
    },
    "id": "motherduck-datasource",
    "type": "dataSource"
  }
}

Self-Service Data Science