MAQL: Composable, Reusable Analytics Queries
Written by Harry Dix |
Whether you’re embedding analytics into an application or portal, or striving to deliver seamless analytics across multiple tools and consumers, one of the most important aspects is consistency; both in providing a consistent, user-friendly analytics experience and in drawing consistent results from the data.
To effectively achieve these goals, GoodData uses a semantic model with its own proprietary query language, MAQL (Multidimensional Analytical Query Language), instead of simply using SQL queries. Why use an alternative query language when you already have SQL? In short, reusability — the ability to easily compose and reuse metrics — which is where multidimensionality comes into play.
Multidimensional modeling works with the concepts of facts and dimensions meaning that MAQL metrics are context-aware. Metrics are sliced by dimension context, which does not need to be specified in the MAQL expressions themselves. This multidimensional capability is one of the features that sets MAQL apart from SQL.
This article aims to highlight the benefits of the MAQL query language, with a comparison and examples. First, let’s break down some of the key terminology/aspects of a semantic model and MAQL’s role within it.
The shared definition of relationships: GoodData’s semantic model defines shared relationships between datasets. These relationships are implicitly used in queries, which, in short, means that your users don’t need to write joins and remember the names of the primary and foreign key columns (like they do with SQL). They also don’t need to remember in which dataset a column (attribute or fact) lives.
Metrics: A metric represents a number that your users work with (e.g., Revenue, Shipping Cost, Interest Rate, Conversion Rate, etc.). Your users want to group-by and filter the metrics by other columns (attributes). The semantic model generates the correct metric for each specific aggregation and filter combination. Your users need to define the combination of attributes, filters, and measures (a plain list of elements). The long (and boring) SQL queries are generated automatically for them thereafter.
Facts: A fact is simply a number to aggregate. It’s a numerical piece of information that you can perform some math on. For example, Age, Price, or Height.
Attributes: Similar to facts, an attribute is a piece of information but, in this context, the aim is not to aggregate it. Instead, it’s a piece of information that describes an entity, and it’s typically text — for example, Name, Status, Category, or Size.
When analyzing data, we often ask quantitative questions like “How much?”, “What is the average?” or “What is the maximum?”. To get answers to these questions, it is necessary to perform some kind of calculation on the data.
In GoodData, we use MAQL to define these calculations. We call these calculations metrics. We then use the metrics in the GoodData Analytical Designer (to consume the data), where we can slice and dice them using different attributes.
There are two ways by which metrics can be created. Simple metrics can be created directly in the GoodData Analytical Designer from facts and attributes using its drag-and-drop interface. More complicated metrics are created using GoodData.CN API (read more about our API and find the documentation here).
MAQL vs SQL
Despite similarities in syntax construction and predefined argument references, MAQL is not the same as SQL. SQL is a general-purpose relational database language for reading and writing data, whereas MAQL is, as highlighted above, multidimensional and data is always evaluated based on the context.
At the technical level, the GoodData Platform separates the way data is stored in the database (physical data model) from the way the data objects relate to each other (logical data model). Workspace developers create the logical data model through a simple, graphical interface, and when the model is published to a workspace, the physical data model is created or updated accordingly.
The great benefit of MAQL is that it allows you, or rather your end users, to write complex and powerful report metrics in a fraction of the time that the corresponding SQL query would take.
Let’s dig a little deeper into the differing approach of MAQL, to that of SQL, in querying data.
Let’s assume, for example, that a user is interested in sales data. With SQL, they would have to write a query like the following:
SELECT SUM(Amount) FROM sales_data;
The result being:
In total, they have sold $1,000,000 worth of products, but they want to go into more detail and find out how much of that $1,000,000 was generated by shoe sales. The SQL query they would need to write would look like the following:
SELECT SUM(amount) FROM sales_data JOIN product_data ON sales_data.product_type_id=product_data.product_type_id WHERE product_data.name = "shoes";
The result being:
To understand how to build this second query, the user must be able to do the following:
- Understand a table and column in the context of database concepts.
- Have familiarity with relational algebra.
- Understand how and where their data is stored in the database in order to query it.
MAQL takes a different approach. The user doesn't need to know where or how their data is stored. The equivalent of the first SQL query above looks like the following in MAQL:
Note that FROM sales_data is not necessary because all relevant amounts are summed from the appropriate table in the datastore. Through the GoodData Portal, you can execute queries without understanding the details of how the data is actually stored in the database.
The equivalent of the second SQL query is the following in MAQL:
SELECT SUM(Amount) WHERE Product Type = shoes
There is no need for remembering table names or figuring out table joins.
The below examples show how creating a metric for profit differs when written in SQL vs MAQL:
SQL: One-off query
SELECT (SELECT SUM(Cost_of_Shipping) FROM Shipping) + (SELECT SUM(Cost) FROM Campaigns) AS Cost_of_Sales
MAQL: Reusable metric
SELECT SUM(Cost of Shipping) + SUM(Cost)
Combining multiple metrics sample
SQL: One-off query
SELECT 100*(1 - ( (SELECT SUM(Cost_of_Shipping) FROM Shipping) + (SELECT SUM(Cost) FROM Campaigns) ) / (SELECT SUM(Amount) FROM Sales) ) AS Net_Profit_Margin
MAQL: Reusable metric
SELECT 1 - Cost of Sales/Revenue
As you can see, with MAQL, your users don’t need to know the names of the tables, primary and foreign key column names, and, unlike SQL once they have written a handful of simple MAQL metrics, they can easily re-use them, further saving time and effort.
Still wondering why you should consider MAQL when you’ve been using SQL your whole career? Re-usability is key. Both in that, as mentioned, you don’t need to write line after line of query code for end users, but also, very importantly, you don’t need to re-write thousands of lines of code due to a structural change in your data (date format, extra columns, etc.).
Back To Consistency
It’s this reuse of metrics that not only supports self-service analytics for end users (regardless of technical ability) — in that they can re-use metrics to customize or create their own visualizations by simply dragging and dropping — but also when used in a headless BI approach, end users can consume the data and obtain the same results as their fellow end users, regardless of the tool they are using (Power BI, Tableau, data science notebooks, etc.). And this is all because of the semantic model providing a single source of truth on your data and MAQL-based metrics allowing for consistent reusability.
Ready To Learn More?
This article provides just a small glimpse into the benefits of MAQL. To see how you can bring consistent analytics to your end users with GoodData, download GoodData.CN Community Edition and follow our GoodData University MAQL course.
Written by Harry Dix |
Subscribe to our newsletter
Get your dose of interesting facts on analytics in your inbox every month.Subscribe