Blog | tags:

The GoodData Way to Artificial Business Intelligence (ABI)

6 min read | Published Oct 13, 2023

As Field CTO at GoodData, Jan brings over a decade of experience in backend systems, databases, and cloud-native architectures. His background spans data warehouse operations, large-scale platform engineering, and applied generative AI. Jan has led multiple major long-term projects at GoodData, including the delivery of our data warehouse-as-a-service built on Vertica as well as the ground-up design and development of our next-generation analytics platform. Today, he splits his time between driving product innovation (particularly in the area of AI assisted BI) and representing GoodData in the field — through conferences, articles, and conversations with customers and partners.

What Helps AI To Succeed?

In this article, I will discuss generative AI powered by Large Language Models (LLMs).

Typical models such as GPT-4 are already pre-trained by billions of tokens (~words), enabling them to answer the most common questions out-of-the-box.

But what if you want it to support specialized use cases from your industry?

There are two ways to improve pre-trained LLMs:

Fine-tune (train) them once with custom data.
Enrich the prompts — add custom context to every question.

Both of these solutions have their pros and cons. While fine-tuning can be incredibly fast and cheaper than the alternative, it needs a lot of data to train. On the other hand, prompt enrichment can be used even with little data but scales very poorly.

Usually, you need to customize LLMs to answer complex questions from a specific domain — in our case, ABI. Also, it may be fine-tuned/prompted with your intellectual property or sensitive data.

You also need to consider where to host the LLM:

Use SaaS services such as OpenAI.
Run LLM on-premise.

During our AI experiments, we discovered that it is easier to use SaaS, as the models are extremely strong out of the box and even more powerful when fine-tuned.

However, sometimes it is mandatory to use on-premise solutions — run open-source models like Llama or Mistral on your premises. Such a case would be if compliance requires it or if you really don’t want to leak your intellectual property.

What Are the Needs of ABI?

To fully leverage LLMs for ABI, we need to enhance them with business intelligence domain knowledge. I am not talking about basic concepts, I mean specific business domains represented by various data models, popular metrics, and the most common insights and dashboards.

Unfortunately, using a typical physical data model to fine-tune LLMs does not help much in this case as table/column names are often (semi) cryptic. The business meaning of a particular column is also usually unknown or vague.

That is why I strongly believe in the concept of semantic models. If you invest in building semantic models and explaining the business meaning of entities, it will be easier for everyone to navigate. As a bonus, you could use this information to fine-tune LLMs more efficiently. After all, LLMs aren't magic; the better the input, the better the output.

Moreover, to automate the process of fine-tuning, we recommend using business intelligence platforms providing everything through high-quality APIs or even SDKs. I am talking about the ability to generate data for fine-tuning/prompting programmatically — with a good SDK, you can collect the semantic model with a few lines of code and generate a set of questions/answers for fine-tuning. This will save you time and can scale very easily, unlike manual work.

What Is Our Product Strategy?

We invested heavily in the concept of the semantic model and API-first approach, meaning that we provide the semantic model through our APIs and powerful SDKs (React, Python), so everyone can build on top of it.

Our open approach towards programmatic solutions made it easier to explore new possibilities, like AI, which means that we could easily experiment with AI and even make a few discoveries. We identified a lot of AI-powered opportunities and prioritized them.

Now, we are actively building a set of AI agents, simplifying the process of fine-tuning and prompting BI use cases and those related to it, such as generating transformation SQL queries to prepare data models for analytics.

Moreover, we are embedding such agents into our UI to satisfy business end users and their needs. We provide it in the form of our Dashboard Plugins. We make them open-source to fulfill the need of developers to customize the solutions quickly.

But this is just the beginning!

Though being able to chat with data (semantic model) is a great supplement to the traditional drag-and-drop BI experience, there is much more we can achieve with generative AI.

Imagine AI generating insights, and even dashboards, from scratch based on the semantic model! If the semantic model is well-prepared, it is feasible to fine-tune LLMs to provide such capabilities. It can also explain how it gets to the result so end users can quickly validate and iterate.

Now imagine combining traditional BI with more advanced use cases such as machine learning. Users could, for example, ask the chatbot to show the prediction of their revenue. The AI could then generate a few insights by looking at the revenue from different angles while using an ML engine to predict the next few quarters for each of the insights. But why stop there? AI could also define corresponding alerts and which business actions should be triggered. All of it with a simple dialog.

It is possible to fine-tune LLMs with literally any analytics use case you can imagine. Right now we are iterating on the best possible solution, providing it to prospects and customers as soon as possible in a closed Beta program, and later on, based on the feedback, we plan on rolling it out to production (or we might throw it away if not successful). My colleague Patrik Braborec wrote a nice article about the first iteration.

But Is It Enough?

Not at all! The following statement certainly resonates with us:

“Any company that does not adopt AI throughout its organization will become irrelevant in the next two years or so.”

No company wants to be irrelevant, so we are going to extend our strategy in a few ways:

Knowledge Transfer

LLMs are quite a new interface to computers — everyone can now be equipped with a co-worker who will help with routines and repetitive tasks. It does not mean that people will become irrelevant, quite the opposite. It means that people can be dramatically more productive if they have access to LLM tools and know how to use them. We are not talking just about engineers. It affects everyone — HR, Finance, UX, etc.

To make this work, we, as developers first onboarding into the LLMs, must provide relevant training to other company departments!

Provide Tools

As mentioned above, the right tools are crucial. For engineers, the obvious solution is to use GitHub Copilot but once again, it is not just about engineering. For example, you can use ChatGPT in Google Sheets and Docs or Chat with any PDF to quickly understand new concepts and materials. Even designers can use AI-powered tools; most notably, they can generate rough drafts to draw inspiration from. There are many possibilities — it is harder to choose the correct tools rather than to find them.

Operate On-Premise LLM for Internal Purposes

If you use the SaaS version of the LLM tools, you should be on the cautious side, as your sensitive data is sent to a third party, which can pose compliance issues. Therefore, we will need to operate a model on-premises to have everything compliant and not risk security problems that would also put our customers at risk.

Partnerships With Industry Leaders

One of our priorities is to form partnerships with industry leaders such as OpenAI, or HuggingFace. These strong partnerships will help us build a better platform for our customers and may also unlock new technical and business opportunities.

Contribute to Open Source and Communities

It’s not just about taking the fruits of the community but also giving back to the community. We believe we have a solid knowledge of data and analytics. As you probably know, a significant part of LLMs are good and reliable data pipelines where we can provide our knowledge and help develop open-source tools. We also leverage tools like LangChain in our demos, and we want to contribute if there is an opportunity.

What's Next?

Artificial intelligence is changing the world as we know it. We at GoodData not only recognize this, but we also welcome this and definitely want to be a part of it! There is certainly more to come, but most of it is currently in private beta.

If you are interested in machine learning and AI use cases for BI, I would highly recommend an excellent article by Jan Kadlec about the Integration of analytics to Slack using ChatGPT or Machine Learning in Dashboards by Štěpán Machovský.

If you would like to try our new AI features when they come out of the private beta, feel free to contact us! Interested in GoodData? Feel free to enroll in a free trial. If you would like to discuss AI, LLMs, or whatever you have on your mind, reach out to us on our community Slack.

Why not try our 30-day free trial?

Fully managed, API-first analytics platform. Get instant access — no installation or credit card required.

Get started

Blog | tags:

Beyond BI Developers