Data Source Managers
Support for Data Source Managers (DSM) is in beta. Beta features are available for users to test and provide feedback. They do not have their implementation finalized. The behavior or interface for these features may change in the future.
Do not use beta features in your production environment.
Data Source Managers (DSM) manage the operation of other Data Sources. You can use DSMs to connect to Data Sources that GoodData does not support (such as MySQL), or you can use them to federate information from multiple data sources in one workspace.
The figure illustrates how you can use Data Sources and Data Source Managers in GoodData:
Data Source with native support
A user in Workspace 1 can build insights only on top of PostgreSQL.
Data Source federation
A user in Workspace 2 can federate data from all Data Sources available in the Apache Drill DSM (CSV files, REST API, and the same PostgreSQL used in Workspace 1).
Data Source without native support
A user in Workspace 3 can query the MySQL database through the Dremio DSM.
GoodData.CN can work with following DSMs:
- Apache Drill
Deployment with Docker-Compose
If you do not have your own Data Source Manager deployment, you can start it as Docker container together with GoodData.CN. Example docker-compose definitions can be found in each Data Source Manager page.
Benefits coming from Data Source Managers
There are multiple benefits to DSMs that may prove useful to the management and visualization of your data through the GoodData.CN platform:
Federation of Data Sources
You can integrate many Data Sources and create a logical data model (LDM) on top of them. Then you can query datasets coming from multiple Data Sources, and the DSM can handle such queries.
Integration of additional Data Sources
You can use DSMs to manage Data Sources that GoodData.CN cannot connect with JDBC drivers, for example:
- Files and object storage services like AWS S3
- REST API
You can use DSMs to cache results from your Data Sources in a format optimized for analytical use cases. GoodData.CN querying the optimized cache performs significantly better. Additionally, you can deploy DSMs as clusters providing horizontal scalability.
You can deploy DSMs as clusters to provide sufficient resiliency. There is no single point of failure.
Optimization of costs
Offloading queries from underlying Data Sources can save significant costs, primarily when you use Data Sources, which you pay, e.g., per hour.
Preparing Data Source Managers for GoodData.CN
GoodData.CN uses metadata from a Data Source to create a logical model. Some Data Sources (CSV files, APIs, or others) do not provide such metadata. Similarly, DSMs do not provide information about referential integrity.
You may use the following strategies to create a better logical data model in GoodData.CN:
- Create an object (table, view, dataset) on top of the Data Source and CAST the data types of columns accordingly.
- Use LDM Modeler and create the primary keys (grains) and their references manually.
- Utilize the naming conventions if you want GoodData.CN to generate the LDM for you.
You can find more specific guidelines on how to deploy and prepare your DSM in the documentation of each DSM.