Generate a Physical Data Model Using the API

Once you have created data sources for your database , generate a physical data model (PDM).

The PDM describes the tables of your database and represents how the actual data is organized and stored in the database. The logical data model (LDM) is then mapped to the PDM and ultimately determines your analytical options when you build insights and dashboards.

To generate the PDM, do the following:

  1. Check the data types in your database .
  2. Scan the relational model of your database .
  3. Store the PDM under the corresponding data source .

Check the Data Types in Your Database

Make sure that the columns that you want to include in the PDM have the supported data types assigned. The data types are validated using the JDBC data types . To find out how the JDBC data types are mapped to the data types in your database, see the documentation for your database.

The following JDBC data types are supported:

  • BIGINT
  • BIT
  • BOOLEAN
  • CHAR
  • CLOB
  • DATE
  • DECIMAL
  • DOUBLE
  • FLOAT
  • INTEGER
  • LONGNVARCHAR
  • LONGVARCHAR
  • NCHAR
  • NCLOB
  • NUMERIC
  • NVARCHAR
  • REAL
  • ROWID
  • SMALLINT
  • SQLXML
  • STRING
  • TIMESTAMP_WITH_TIMEZONE
  • TIMESTAMP
  • TINYINT
  • VARCHAR

Columns with an unsupported data type are skipped and not included in the PDM.

Scan the Relational Model of Your Database

To scan the relational model in your database, submit a POST request to api/actions/dataSources/<data-source-id>/scan. <data-source-id> is the ID of the data source that corresponds to the database you want to scan.

By default, all the tables, views, and columns in your database are scanned, and a JSON file with a declarative definition of the PDM is generated. To generate a more accurate PDM, especially when your database contains a large amount of tables/views, you can do the following:

  • Change tables, views, and columns so that the database contains only the entities that have to be included in the PDM.

  • Fine-tune the API request to narrow down the scope of the database objects to scan:

    • Define what to scan: tables, views, or both.
    • Define the prefix to scan only the tables/views whose names contain this prefix.

You can also prepare the tables/views for use in complex analytical scenarios .

Example: Scanning the demo-ds data source that represents the pre-installed PostgreSQL database with the sample data prepared in the GoodData.CN Community Edition image

The PDM definition will be saved to the pdm.json file.

Bash
PowerShell 7
curl $ENDPOINT/api/actions/dataSources/demo-ds/scan \
  -H "Content-Type: application/json" \
  -H "Accept: application/json" \
  -H "Authorization: Bearer YWRtaW46Ym9vdHN0cmFwOmFkbWluMTIz" \
  -X POST \
  -d '{"separator": "__", "scanTables": false, "scanViews": true}' \
  | jq . > pdm.json
Invoke-RestMethod -Method Post -Uri "$ENDPOINT/api/actions/dataSources/demo-ds/scan" `
  -ContentType 'application/json' `
  -H @{ 
    'Accept' = 'application/json' 
    'Authorization' = 'Bearer YWRtaW46Ym9vdHN0cmFwOmFkbWluMTIz' 
  } `
  -Body '{"separator": "__", "scanTables": false, "scanViews": true}' | ConvertTo-Json > pdm.json

Store the PDM under the Corresponding Data Source

To store the generated PDM under the corresponding data source, submit a PUT request to /api/layout/dataSources/<data-source-id>/physicalModel. <data-source-id> is the ID of the data source that corresponds to the scanned database. In the API request, specify the JSON file with the PDM definition.

Example: Storing the PDM generated from the demo-ds data source

Bash
PowerShell 7
curl $ENDPOINT/api/layout/dataSources/demo-ds/physicalModel \
  -H "Authorization: Bearer YWRtaW46Ym9vdHN0cmFwOmFkbWluMTIz" \
  -H "Content-Type: application/json" \
  -X PUT -d @pdm.json
Invoke-RestMethod -Method Put -Uri "$ENDPOINT/api/layout/DataSources/demo-ds/physicalModel" `
  -ContentType 'application/json' `
  -H @{ 'Authorization' = 'Bearer YWRtaW46Ym9vdHN0cmFwOmFkbWluMTIz' } `
  -InFile pdm.json

Once you have stored the PDM, build the LDM.

Advanced Use Cases

Multiple Datasets Mapped to the Same Table

This is typically needed when one table represents multiple logical entities, and each entity should be represented by a separate dataset in the LDM.

While the LDM Modeler supports mapping of multiple datasets to the same table, publishing an LDM with such mapping fails.

To avoid this issue, create multiple views on top of the table and map each dataset to a separate view.

For example, you have two tables, users and tickets.

  • The users table contains ticket creators and assignees.
  • The tickets table contains the assignee_id and creator_id columns.

To avoid mapping multiple datasets to the users table, do the following:

  1. In the database, create two views on top of the users table: v_users_assignees and v_users_creators.
  2. In the LDM, create three datasets: assignees, creators, and tickets.
  3. Map the tickets dataset to the tickets table.
  4. Map the assignees dataset to the v_users_assignees view.
  5. Map the creators dataset to the v_users_creators view.
  6. Create a relationship from the assignees dataset to the the tickets dataset using the assignee_id column as a primary key in the assignees dataset.
  7. Create a relationship from the creators dataset to the tickets dataset using the creator_id column as a primary key in the creators dataset.

No Single-column Primary Key in Tables

While the LDM Modeler supports setting multi-column primary keys in datasets, publishing an LDM with multi-column primary keys fails.

To avoid this issue, create one-attribute primary keys in your database:

  1. Concatenate the table columns that comprise the primary key in the table.
  2. Calculate a hash value for the concatenated columns.
  3. Use the hash value as a one-attribute primary key.

Multiple PDMs Generated from the Same Data Source

If you need to prepare different PDMs from the same data source to serve various business needs, we recommend that you create multiple data sources from the same database and generate a PDM for each data source separately.

To narrow down the scope of the database to scan, you can create views with different prefixes and only scan the database with a specific prefix for each data source. Also, you can define different credentials/privileges or schema for each data source.