Why AI in Analytics Needs Metadata


AI is transforming nearly every industry, and analytics is no exception. But to fully leverage AI's potential in analytics, we must clearly understand what makes analytics effective in the first place.
Analytics isn't just about providing information, it's about delivering accurate insights precisely when and where decisions are being made. AI can instantly bridge the gap between questions and answers, offering real-time insights seamlessly across various devices, platforms (through e.g., MCP), or even wearables like your smartwatch.
However, integrating AI into analytics introduces two critical challenges:
- Data Privacy and Security:
- Sending sensitive data to large language models (LLMs) inherently risks exposure or leaks.
- Even if AI providers claim no misuse, transmitting data externally via APIs always carries significant risks.
- Metric Accuracy and Consistency:
- AI models, particularly LLMs, are prone to hallucinations and inaccuracies.
- Relying solely on these models for accurate analytics can lead to misleading or inconsistent insights.
- Larger data can’t be processed by the LLM, because it will inevitably hit the context window.
To effectively address these issues, analytics solutions need to tackle both privacy and accuracy simultaneously. At GoodData, we've developed a metadata-first approach designed explicitly to overcome these limitations. But before exploring how this works technically, let's first see why you don’t need to send raw data to the LLM, and why utilizing metadata is needed for AI-driven analytics.
Why Metadata Is Enough (Most of the Time)
When you interact with data effectively, you're rarely dealing directly with raw values. Instead, you're primarily engaging with metadata, along with already computed metrics and aggregations. Data engineers and analysts also typically don't consume raw data directly; they rely heavily on defined functions, metrics, and computed results to identify trends, clusters, or anomalies.
This insight applies equally to AI-driven analytics. AI doesn't necessarily require access to your raw data to generate meaningful insights. Metadata alone (such as schema definitions, computed metrics, aggregations, and data relationships) is often sufficient for AI to execute precise analytical queries and produce reliable visualizations or actionable insights.
Let's start small with a concrete example using PandasAI. PandasAI effectively demonstrates how an AI model can utilize column names, data types, and computational functions without needing direct access to raw underlying data:
import pandasai as pai
# A simple DataFrame
sales_df = pai.DataFrame({
"country": ["United States", "United Kingdom", "France", "Germany", "Italy", "Spain", "Canada", "Australia", "Japan", "China"],
"revenue": [5000, 3200, 2900, 4100, 2300, 2100, 2500, 2600, 4500, 7000]
})
# API key for PandasAI (using BambooLLM by default)
pai.api_key.set("your-pai-api-key")
top_countries = sales_df.chat('Which are the top 5 countries by revenue?')
print(top_countries)
# Output: [China, United States, Japan, Germany, United Kingdom]
Here, PandasAI doesn't directly access raw data; it leverages available metadata (like column names [“country”, “revenue”], data types, and predefined computations) to execute the required analytical operation and return a precise answer. This foundational concept (metadata-based analytics) is precisely what GoodData expands upon at scale, integrating seamlessly into more complex and secure enterprise environments.
AI Analytics Scaling Beyond a Simple Dataframe
Running AI analytics on simple dataframes works well for straightforward scenarios, but real-world analytics typically involve multiple large datasets, complex relationships, diverse metrics, and data stored across different sources. Basic dataframe operations become insufficient as complexity grows. Out-of-the-box AI systems lack inherent understanding of intricate data relationships, business contexts, and permissions structures, which is precisely where metadata-driven analytics shine.
How Does Metadata-First AI Analytics Work in Practice?
For AI to effectively answer sophisticated business questions, it needs a deep and structured understanding of your data’s semantic organization: what data points represent, how they interrelate, and the business logic governing their usage. Metadata-first analytics enable AI to translate this comprehensive understanding into accurate dashboards, visualizations, or actionable insights without needing to directly handle or expose the raw underlying data.
In practice, this approach results in two clearly separated layers:
- Data Layer: Secured raw data, accessed strictly within your control.
- Metadata Layer: Contains structured definitions, such as table schemas, calculated metrics, dimension hierarchies, and permission rules, but not the data itself.
In GoodData’s metadata-first AI analytics architecture, the Large Language Model (LLM) interacts with the metadata layer, ensuring data privacy and security. The analytic’s execution, computation, and actual data crunching is done by a deterministic algorithm.

Does Metadata-First Mean You Can’t Use an LLM to Explain Your Data?
Adopting a metadata-first approach doesn't prevent you from selectively leveraging LLMs in advanced analytical scenarios. We're actively exploring specialized use cases where machine learning algorithms first analyze your data directly, and the summarized results — not the raw data itself — are then provided to an LLM. The LLM helps deliver intuitive explanations, simplifying complex analytical insights such as key driver analysis, clustering, or anomaly detection, effectively serving as an analyst on demand.
Crucially, these enhanced scenarios remain clearly defined, optional, and on top of local LLMs, where customers can bring their own on-premise LLM. Deterministic algorithms will continue to securely manage core analytics computations, ensuring accuracy and reliability, while still offering flexibility to leverage LLM-driven insights when additional clarity or depth is beneficial.
Leveraging the Semantic Layer for Precision
At GoodData, our Logical Data Model (LDM) serves as the core semantic layer, capturing all the necessary context and metadata required for meaningful analytics. The LDM structures data clearly, logically, and intuitively — initially created for human analysts but equally powerful for AI applications.
To enhance the capabilities of our semantic layer further, GoodData leverages vector databases to store the semantic embeddings of analytical objects. This facilitates efficient semantic searches, enabling the LLM to rapidly identify and utilize the correct metadata definitions. GoodData's semantic search system automatically verifies the compatibility, correctness, and computability of all analytical objects before they are exposed to the LLM, ensuring accuracy and consistency in every AI-generated insight.
How analytics can be AI-friendly
If there's one thing AI excels at, it’s generating, updating, and refining code. GoodData capitalizes on this strength by providing a robust, developer-centric analytics environment powered by a highly structured semantic layer.
At the core of GoodData’s AI success is Analytics as Code (AaC). AaC transforms analytics into a code-first practice, making it incredibly developer-friendly. Analytical objects are expressed in clean, versionable .yaml files. This structure enables developers and data analysts to seamlessly create, modify, and collaborate on analytics directly from their IDEs or command-line interfaces, just like managing code in Git.
Because analytics definitions are represented in structured, human-readable code, AI models can effortlessly understand and manipulate them. An LLM, armed with GoodData’s semantic metadata, can translate natural language questions into precise visualizations, dashboards, or analytical objects. In practice, you can simply ask your AI assistant for a particular dashboard or metric, and it can quickly generate the corresponding .yaml definitions, ready to be integrated.

GoodData also provides comprehensive APIs and SDKs, enabling developers and AI models alike to orchestrate complete analytics workflows programmatically. For example, an LLM can use GoodData’s API documentation to automate tasks such as creating analytical pipelines, updating metrics definitions, or dynamically generating visualizations tailored precisely to user requests.
In essence, GoodData seamlessly integrates AI into every aspect of analytics — from defining new analytical components through structured code, to orchestrating complex analytics tasks via APIs and SDKs — providing a fully flexible, scalable, and developer-focused analytics solution.
Sneak Peek into the Future of Analytics — Ontology
We're currently experimenting with ontology as the foundational source of truth for analytics. Ontology enhances the semantic layer by providing AI models with deep, structured knowledge of business contexts, concepts, and relationships. This structured representation allows AI to understand not just the data itself but the underlying business semantics and logic.
With ontology integrated into analytics, AI models gain the ability to understand complex business relationships as thoroughly as domain experts or seasoned data analysts.
Imagine an analytics future where decision-making is simplified to describing the decision you need to make. Your AI assistant, leveraging ontology-driven knowledge, instantly delivers comprehensive, actionable insights tailored precisely to your business context.
Conclusion
AI-driven analytics dramatically accelerates decision-making by seamlessly bridging the gap between complex questions and actionable insights. Looking ahead, AI integrated deeply with business semantics through ontology and structured metadata will revolutionize how decisions are made, transforming data analytics into proactive decision support systems that deliver insights exactly when you need them.
However, deploying AI analytics securely and reliably requires strong protection against data exposure and safeguards to ensure metric accuracy. GoodData comprehensively addresses these challenges with its metadata-first analytics architecture and Analytics as Code framework. By clearly separating raw data from AI interactions, GoodData ensures your data stays safe while ensuring your analytics remain precise, powerful, and adaptable to any scenario.
Experience the power and security of AI-driven analytics firsthand — sign up for a free GoodData trial today and explore how easily and effectively you can integrate intelligent analytics into your workflow.
Why not try our 30-day free trial?
Fully managed, API-first analytics platform. Get instant access — no installation or credit card required.
Get started