Enhanced Model Validation
Private Beta Feature
Interested in trying this feature? The functionality detailed in this article is in private beta testing. To request exclusive access, please connect with us on Slack.
The databases you connect to GoodData may contain tables with duplicate or missing values. This can cause issues when you are creating the logical data model (LDM) in GoodData and you want to use these values as primary keys, since primary key values have to be unique and not empty.
We have enhanced the model validation, now it scans your datasets and detects issues related to duplicate or missing primary key values. Model validation generates a list of warnings and errors to be displayed in a panel in the LDM modeler.
If a dataset contains fields with missing values, a Field contains empty values warning will be displayed in the Model validation panel on the right side of the modeler:
This validation is triggered whenever a new field (fact or attribute) is added to the LDM.
GoodData can work with empty values, but it may be practical to avoid them.
Fix Null Values
In data modeling, handling NULL values is crucial to maintain data integrity and ensure accurate analysis. NULL values can be a challenge, especially when working with dimension attributes.
Instead of resorting to NULLs, a pragmatic approach is to assign a specific value—like 0 or “N/A”—when data is missing. This replacement can be initialized at the data source level. However, GoodData provides a more tailored solution, allowing model authors to execute this transformation using SQL.
Convert your dataset to an SQL dataset, see Create SQL Datasets.
Depending on your database’s SQL dialect, you can employ functions such as
DECODE to replace NULL values.
SELECT NVL(original_column, 'N/A') as original_column FROM source_table;
SELECT DECODE(original_column, null, 'N/A', original_column) as original_column FROM source_table;
To verify the success of your transformation, run the query and examine the previewed results. Ensure that the column names in these results align with the original columns. This step is crucial, as it guarantees the mapping to existing attributes remains unaltered.
It’s worth noting that SQL can be a versatile tool beyond just replacing NULL values. With it, you can perform various data cleansing operations, such as unifying text representations, modifying case structures, or converting data formats.
Null Values in Primary Keys
Whenever you set a field as a primary key, a validation is triggered that checks if the field contains null values. If a primary key field is found to contain null values you will get the Primary key contains empty values error in the Model validation panel:
Primary keys always have to have some value, so you need to fix this before finalizing your LDM.
Fix Null Values in Primary Keys
Duplicate Values in Primary Keys
Whenever you set a field as a primary key, a validation is triggered that checks if the field contains duplicate values. If a primary key field is found to contain duplicate values you will get the Primary key contains duplicates error in the Model validation panel:
Primary keys must contain only unique values, so you need to fix this before finalizing your LDM.
Fix Duplicate Values in Primary Keys
To resolve issues with duplicate values in primary keys, consider the following steps:
Use Data Profiling to Identify Alternative Primary Keys:
Through data profiling, you can identify potential candidates for primary keys. Suitable candidates will have “0 duplicates” displayed in the field’s header within the data profiling tab.
Consider Introducing a Composite Key:
If none of the fields in your dataset is unique, think about creating a new unique field composed of multiple existing fields.
Utilize GoodData’s SQL Editor:
GoodData offers an SQL editor to help you introduce new fields. Simply click on “Fix in SQL editor” within the “data profiling” tab of dataset details whenever a primary key error is identified. This tool will provide a generated SQL statement as a starting point, which includes all available columns in a SELECT clause. Depending on the SQL dialect of your database, you might use functions like CONCAT to merge multiple fields and formulate a new composite key.
Example SQL Query for Creating a Composite Key:
SELECT brand, model, CONCAT(brand, '-', model) as new_unique_identifier FROM source_table;
Update the Model:
After adding the new field, go to the modeler to include a new attribute. Map this attribute to the newly-created column and designate it as the primary key attribute.