Prepare Your Data

When you connect your database to GoodData, the schema of your database is then used inside a workspace to automatically generate a the logical data model (LDM). The LDM is what ultimately determines your analytical options when you build visualizations and dashboards.

To ensure the LDM is generated successfuly, we recommend you consider:

Supported Data Types
Recommended Naming Conventions
How Is LDM Derived

Supported Data Types

Make sure that the columns that you want to include in the model have the supported data types assigned. The data types are validated using the JDBC data types. To find out how the JDBC data types are mapped to the data types in your database, refer to documentation for your specific database.

GoodData supports the following JDBC data types:

BIGINT	BIT	BOOLEAN
CHAR	CLOB	DATE
DECIMAL	DOUBLE	FLOAT
INTEGER	LONGNVARCHAR	LONGVARCHAR
NCHAR	NCLOB	NUMERIC
NVARCHAR	REAL	ROWID
SMALLINT	SQLXML	STRING
TIMESTAMP_WITH_TIMEZONE	TIMESTAMP	TINYINT
VARCHAR

Columns with an unsupported data type (for example, when the UUID database data type is mapped to the OTHER JDBC data type) are skipped and not included in the LDM.

Matching Data Types

Columns that reference primary keys should have the same data type to ensure smooth and effective joins between datasets. Avoid mixing different data types, such as using a numeric type column with a text type column, to ensure successful equi-joins and avoid potential issues. GoodData automatically checks for mismatched data types to prevent errors during runtime.

Recommended Naming Conventions

We recommend you name the columns in your tables a certain way by adding a specific prefix to each column name. Columns with these prefixes will be automatically converted to a specific type of field when a dataset is created:

Entity	Name Format	Example
Primary Key	`gr__<baseName>` or `cp__<baseName>`	`gr__id_customer`
Fact	`f__<baseName>`	`f__sales_amount`
Label	`ls__<attributeBaseName>__<labelBaseName>` or `l__<attributeBaseName>__<labelBaseName>`	`ls__country_id__country_name`
Reference	`r__<tableBaseName>__<referenceBaseName>`	`r__customer__surname`
Both Primary Key and Reference	`grr__<tableBaseName>__<referenceBaseName>`	`r__customer__surname`
Workspace Data Filter	`wdf__<baseName>`	`wdf__region`
Date Reference	`d__<date>` or `d__<date>__<granularity>`	`d__date__month`

grr_

GoodData supports composite keys, that allow users to have foreign keys as part of a primary key (grain) of a dataset.

In data sources where this cannot be determined from constraints the grr (grainReference) prefix is to be used, which marks the column as both part of the primary key and a foreign key.

The prefixes are not case-sensitive.

If these prefixes are absent, your columns will be mapped using the default configuration. You can always re-map your fields (columns) manually by going to the View details → Data mapping tab for each one of the datasets.

Date References

To support automatic creation of date dataset references in database schemas used with Aggregate Awareness, use the d__ prefix naming convention.

d__<date> – creates a reference to a date dimension dataset with the identifier <date>.
- Example: d__date → reference to the date dimension dataset date.
d__<date>__<granularity> – creates a reference to a date attribute within the date dimension dataset at a specific granularity.
- Examples:
  - d__date__day → date attribute with day granularity.
  - d__date__month → date attribute with month granularity.
  - d__date__quarter → date attribute with quarter granularity.
  - d__date__year → date attribute with year granularity.

Create Your Own Naming Convention

You can define your own prefixes (including the separator, it does not have to be __) when creating a logical data model using the API or Python SDK. See:

The Python SDK function generate_logical_model
The API endpoint /generateLogicalModel

Separators

In the table above, a double underscore (__) is used as a separator between prefixes and entity names, or between separate sections of reference names and label names. Note that in an API request for generating the LDM you can define any string as a separator.

We recommend that the string you want to use as a separator be as unique as possible. Consider the situation when you use an underscore (_) as a separator and you have a column named referencePrefix_table_1_fk_column. This column name cannot be split into three sections as it would be required (prefix, referencedTable, referencedColumn).

Naming Conventions vs. Referential Integrity

Do not apply the naming conventions to the columns that are used as or are included in a primary key or a foreign key in your database. Such columns will be interpreted as grains and references based on referential integrity (see Default Configuration).

How Is LDM Derived

When a declarative definition of the LDM is generated, certain rules are applied that affect how LDM entities are created based on the database. Understanding these rules will help you adjust your database and make the process of generating the LDM smoother and more accurate.

Default Configuration

By default, entities for the LDM are derived from the database based on data types assigned to table columns and referential integrity in the database.

LDM Entity	Expected Database Entity
Dataset	Table, view
Attribute	`CHAR`-like, `INT` columns
Fact	`NUMERIC`-like columns
Date dataset	`DATE`, `TIMESTAMP` (`TZ`) columns
Grain	`PRIMARY KEY`
Reference	`FOREIGN KEY`

Date and timestamp table columns must be of the DATE or TIMESTAMP (TZ) data type to be imported as a date dataset. String columns cannot be imported as date datasets.

The default configuration has the following limitations:

It always reads the whole database; you cannot select just a part of the database to read.
It cannot detect advanced LDM entities such as attribute labels.
It cannot generate facts from columns with the INT data type.
It cannot generate grains and references if referential integrity is not maintained in the database.
It cannot generate proper LDM entities (such as facts, labels, and so on) from columns with incorrectly defined data types.

To overcome these limitations, use the naming conventions in your database to make sure that entities are identified and interpreted correctly regardless of the column data types and referential integrity.

Entity Names in the Generated LDM

In the generated LDM, entity names and descriptions are generated based on the corresponding column names. To make an entity name more human-readable, the following happens when the entity name is injected into the LDM:

The prefix is removed from the name.
The first character is upper-cased.
Any non-alphanumeric character is replaced by a space.

Column Name in DB	Entity ID in LDM	Entity Name in LDM
`gd_table__customer`	`customer`	`Customer`
`customer.f__amount`	`customer.amount`	`Amount`
`customer.ls__customer_key__customer_name`	`customer.customer_name`	`Customer name`
`customer.region_name` (attribute)	`attr.customer.region_name`	`Region name`

Duplicate label IDs may be generated for different columns in the same table in the database (and eventually for different labels in the corresponding dataset in the LDM).

For example, both lineitem.ls__customer_id__url and lineitem.ls__supplier_id__url will be transformed into lineitem.url.

While an LDM with duplicate label IDs can be generated, publishing this LDM will fail.

To avoid this situation, make sure that the base label names distinguish from each other, for example:

Use lineitem.ls__customer_id__customer_url instead of lineitem.ls__customer_id__url so that the label ID is generated as lineitem.customer_url.
Use lineitem.ls__supplier_id__supplier_url instead of lineitem.ls__supplier_id__url so that the label ID is generated as lineitem.supplier_url.