Blog   |   tags:  

Building Trust in Data: Refine Your Semantic Layer with Catalog and Quality Agent

3 min read | Published
  • GoodData logo
Written by Stepan Machovsky
Building Trust in Data: Refine Your Semantic Layer with Catalog and Quality Agent

Analytics work gets messy when metadata lives everywhere. Metrics in one place, attributes in another, facts and dates scattered across projects. Small edits turn into long hunts. You need a place where this knowledge lives together.

Centralization helps, but it raises a harder question. Is the content consistent and healthy? Do titles match the logic? Do descriptions repeat without meaning? Are acronyms clear to people outside the original team? Seeing everything in one place is the first step. Knowing what needs attention is the second.

Analytics Catalog gives you one place to see and manage the semantic pieces that power your reports. Open it, search, and you get the shape of your analytics in minutes.

Semantic Quality Agent

The Semantic Quality Agent looks across the catalog and points to issues that slow you down. No need to click through objects for hours. You get a focused set of findings that surface duplication, drift, and unclear language.

Scope is simple. The check runs on a subset of types today. Metrics, attributes, facts, and date objects are included. That covers the bulk of daily work and leaves room to expand.

What it checks

The agent looks for objects that are the same or almost the same. It calls out identical descriptions that hint at copy and paste drift. It flags titles and descriptions that are semantically close even when the wording differs. These findings help you pick a canonical object, rename what needs clarity, or deprecate what is redundant.

Unknown abbreviations get special attention. If a reader meets ASP with no definition nearby, they have to guess. The agent highlights these tokens so you can add a short definition or expand the title. That improves handoffs and onboarding without touching the logic.

How the abbreviation pass works

Deciding what is unknown is not trivial. The agent uses several passes to keep noise down and precision high.

First, it whitelists in-text definitions. When a description says Average Selling Price (ASP), ASP is treated as known from that point.

Second, it runs a token analysis. Long or unusual tokens are pulled out, and embeddings help filter normal vocabulary that appears in uppercase.

Third, it runs a dictionary check using Enchant. It also samples your own metadata to learn frequent team and product terms so they do not get flagged.

Fourth, there is an LLM stage. The goal is smarter handling of domain specific jargon without changing your content. And while LLM is quite smart for abbreviations and finding problems, it is also very expensive to run and have false-positives.

All of this relies on text processing and regular expressions. No hidden rewriting. You get clear signals. You decide the edits, because if LLM can suggest edits it can understand it and then was not a problem in the first place.

What it does not do

The agent does not auto fix problems yet. It suggests edits and points to the right place to act. If a system can propose a concrete change, you have enough context to understand the issue. That keeps control with the team and avoids silent changes.

Working with findings

Start in Analytics Catalog and filter to the part of your model you own. Run the agent. Review findings by impact. Duplicates and near duplicates are quick wins. Unknown abbreviations are easy to resolve with a one line definition. For semantically close titles or descriptions, pick the clearest wording and align the pair. The goal is a catalog that a new teammate can read without guesswork.

Practical examples

Two objects named Gross Margin and Sales Revenue Margin might share the same description even though they serve different use cases. The agent places them side by side so you can decide what stays canonical and what needs a rename or a deprecation.

MRR and Monthly Recurring Revenue often appear together. Choose one title as the standard and tag the other for discovery.

When NSAT appears with no nearby definition, add one sentence to the description. That small change prevents repeated questions later.

Writing metadata that holds up

Titles should read well to someone new to the domain. Descriptions should lead with the business meaning before the logic. If a metric includes filters or period rules, add a short example. Keep a lightweight glossary in the project and link to it from common objects. Tag ownership so questions land with the right person.

What is next

Coverage will grow beyond the current object set. Semantic checks will go deeper across titles and descriptions. The planned LLM stage for abbreviations will help with niche vocabulary once it is ready. Same goal throughout. Clear signals. Safe to act on. Easy to explain.

Bottom line

Analytics Catalog gives you one place to manage the semantic layer. The Semantic Quality Agent keeps that layer understandable and consistent. Use both to reduce duplication, surface unclear language, and keep your analytics readable for the next person who inherits it.

Blog   |   tags:  

Read more

Cover image for

Cover image for

Cover image for