Find What Drives Your Metrics: Comparing Key Driver Analysis Approaches for BI


When it comes to ad-hoc decisions based on Business Intelligence (BI) there are usually two major factors that help you understand the underlying story in your data: What has changed in your data and why.
To understand what has changed, we can usually utilize numerous proven tactics. From simple thresholds to anomaly detection algorithms. These algorithms can be surprisingly tricky, when you want to see unexpected change, rather than some threshold, but on that in another article.
The process to understand what drives those changes in your business metrics (revenue, churn, conversions, …) is usually called Key Driver Analysis (KDA). It helps uncover why things change and help you create more informed decisions.
At GoodData, we explored different ways to implement KDA (specifically period-over-period change analysis) efficiently, accurately, and at scale. We compared three different approaches commonly used in BI software:
- Attribute-level Aggregation
- Linear Regression
- Gradient Boosting with SHAP
We compared them based on their interpretability, accuracy and scalability. We also tried them on a public data set (e-commerce sales data) representative of what a smaller e-commerce could have.
To illustrate the points, let’s consider this story:
You have a sudden change in sales this month and you’d like to understand what drove this change. Your business is international and you resell electronics. This business case has potentially a lot of key drivers such as growth/decline in particular countries, product segments, campaigns, products… you get the point. And you want to be able to uncover the most impactful drivers of that growth/decline to make adjustments to maximize your revenue.
Here’s what we found.
Attribute-level Aggregation
Probably the most straightforward approach to do KDA is to slice the metric by each possible dimension independently, calculate how much the metric changed for each value of each attribute, and sort the changes by magnitude.
That’s an approach many business people would probably do in Excel if they were asked to find key drivers of a metric increase/decrease. They would plot bar charts with the metric aggregated by each dimension and try to find the values with highest increase/decrease and then sort them.
Let’s look at the example from the introduction. You would first look at the revenue by country, find out that there was a significant increase in revenue in the US, and then look at the revenue by product category and find out that there was a significant increase in revenue of mobile phones. Those would be your key drivers of the increase.
This approach is easy to explain to business people, transparent, easy to visualize (with bar charts), easy to implement, and finally fast to calculate.
But it is very simplified and has many disadvantages.
The main disadvantage is that it double-counts contributions of the drivers. Let’s take an even closer look at the previous example. Let’s say that a new phone was announced this month which led to an increase in sales in all countries but the US is the largest market (for the company) so there will be a big spike (compared to other countries) as well. So if you do the attribute-level analysis, which looks at each dimension independently, both the US and phones will look like key drivers. However, the US increase was driven by the phone sales and other product categories did not grow there. So the US should not be identified as a key driver. Only the phone category should.
This problem is called double counting of drivers or confounded driver attribution and is caused by looking at each attribute independently and not taking into account their dependence.
Another shortcoming of this method is that it ignores attribute interactions. Let’s say there was a big ad campaign and discount on laptops in Germany in a given month. Since Germany is a large market globally and laptops have large market share there, if you aggregate over countries, Germany will look like a big driver, and if you aggregate over categories, laptops will look like a big driver as well (globally). But sales of other categories in Germany stayed the same (so German economy was not the driver) and also computer sales stayed the same in other countries, so computers were also not the driver by itself.
It was the interaction/combination of the campaign in Germany for computers. Single attribute-level analysis won’t uncover this and will make Germany and laptops look like drivers even though they were not (and it will also double count them).
There are other issues/effects that this method does not handle well, such as mix/composition effect issue, where if for example computer market is way bigger in Germany relative to phone market (let’s say 90/10) compared to other countries (let’s say there it usually is 50/50) then e.g. global computer campaigns/discounts will drive sales for the whole category globally, but it will increase sales in Germany disproportionately because the computer market share is larger there so it will make Germany look like the driver even though the computer category was the driver.
Despite all these limitations, this method can still be very useful. It basically automates what an analyst would do if they wanted to narrow down potential key drivers in an Excel or a similar BI tool. But it does it way faster. And coupled with some additional measures to filter out obvious/uninteresting potential key drivers it can save a lot of work and help analysts choose the right area to focus on and dig deeper.
It’s important to be aware of the limitations and interpret the results correctly. It basically gives you a list of variables (attribute values such as specific countries or product categories) where the target metric changed the most so you know where to look. But an analyst still has to go through them and correctly assign/credit the contribution based on some domain knowledge or further analysis. With the univariate analysis, the contribution is not assigned proportionally among dependent variables based on their true contribution. Because of that, the sum of all these contributions will be larger than the total contribution.
Linear Regression Models
A more advanced approach are linear regression models, such as linear or logistic regression. The main advantage over the univariate analysis is that they take into account the already mentioned relationships between the dimensions and distribute the total contribution between the drivers so they are not double counted. So in the first example, it would be able to determine that the key driver was the phone category and not the US. They can also solve the issue with interactions by including so-called interaction terms in the model.
A big advantage is also that the resulting drivers are easily interpretable and familiar to business analysts and it is built on top of a solid statistical foundation.
On the other hand, with the increasing number of dimensions, cardinality, and/or number of interaction terms the dimensionality blows up pretty quickly (quadratic function) so it takes long to calculate and also the results can become noisy.
Linear regression models also make strong assumptions about the data which if not met can lead to incorrect and misleading results. And the quality of the results depends on how well the model is able to fit the data.
Another disadvantage, in the context of BI software, is that a separate model has to be computed for each time period (for period-over-period change analysis) and each filter combination. This makes it infeasible to precalculate them for if there is a large number of possible filter combinations.
Gradient Boosting with SHAP values
Non-linear models, such as gradient boosting or random forests, together with calculating SHAP values tackle most of the problems of the previous two approaches.
First of all, it handles all the issues mentioned earlier with double counting, interactions, and mix/composition effects thanks to using multivariate and non-linear models that can model dependencies between variables and SHAP values that fairly distributes the total contribution among all the factors. And it also does not make any assumptions about the underlying data so it can be used on arbitrary data.
Also, compared to the linear regression models, it can handle (based on the underlying model that is used) categorical attributes natively and won’t explode in complexity with attributes with high cardinality.
Finally, the SHAP values are additive so they can be calculated once and then aggregated for different levels/attributes (by country, product, etc.) and different filter combinations. And the underlying model can be trained on the whole data set (not just the compared periods) so it can provide both local and global explanations (that is both drivers in a given period and long-term trend drivers) assuming the model has large enough capacity to capture those insights.
On the other hand, approaches based on non-linear models and SHAP values are quite a black-box and difficult to interpret, visualize, and explain. That makes them less transparent and trustworthy.
There are also a lot of knobs that have to be fine-tuned on each specific domain and data set so it’s difficult to make it work automatically on any domain or data set without any prior knowledge. Typically, some manual feature engineering and parameter tuning is required, although it can be automated to some extent.
The quality of the results depends on how well the underlying model can fit the data, so if all the knobs are not correctly set it will lead to incorrect and misleading results.
Finally, this method is computationally expensive. But on the other hand, it can be easily precomputed and parallelized (unlike the linear regression models) so it can be sped up with more resources. Also, having one model for all time periods and the additive nature of SHAP values makes it easy to precompute and cache the model.
Conclusion
We reviewed three paths to Key Driver Analysis:
- simple attribute-level aggregation,
- linear regression,
- non-linear models with SHAP.
For the first release, we chose attribute-level aggregation because it aligns with how analysts reason about data, it is easy to explain, fast to compute, and it works across domains without fragile model assumptions. When used thoughtfully, it highlights credible candidates for further investigation instead of pretending to deliver perfect attribution.
To raise the signal and cut the noise, we added two upgrades. First, we detect only statistically meaningful shifts within each dimension, which limits false positives. Second, we rank and select the most promising business dimensions before we run the analysis, which keeps the results focused even in complex environments with many potential drivers.
This approach sets a dependable baseline that teams can trust. Even with many filters and frequent updates. It avoids the risk of confident but misleading results that can occur when a generic model does not fit a specific dataset. And it creates a clean runway for the future. If a customer needs deeper precision or wants to shorten the path from anomaly to insight, our professional services can deliver a tailored ML solution based on linear or boosted models with SHAP, calibrated to the customer’s data and context.
Tl;DR: Start clearly, build trust and scale to advanced methods when the value is proven.
Want to learn more?
Stay tuned, if you'd like to learn why we weren’t the only one who chose attribute-level aggregation as the default algorithm for KDA, because we will soon release a product-first POV on the matter.