How to Build Semantic Layers for Massive Scale Analytics

5 min read | Published Oct 23, 2019

Share

Written by GoodData Author

How to Build Semantic Layers for Massive Scale Analytics

Share

Part One of Four

In my previous blog post, I talked about how most product owners are looking to integrate embedded analytics into their data products to help make that product easy to use. I also talked about how semantic layers are a good solution to this problem because of their ability to:

Harmonize and simplify the incredible complexity of data
Present the available data in an understandable way to encourage use by a wide audience
Document and preserve data lineage and best practices

As a reminder, think of semantic layers as a shield—or a layer, between the user and the sheer volume of data that’s been collected. We already know that semantic layers abstract away the complexity of underlying data sources to make things simple and intuitive for business users. Now, let's take this knowledge a step further. What are the characteristics of best-quality semantic layers in the context of large-scale embedded analytics applications?

Semantic layers in internal BI solutions vs. apps with thousands of external users

Semantic layers (SLs) create enormous value for business users, but they take time to build. While deep SLs are not always needed for use cases with savvy internal analysts, for customer-facing analytics use cases, they are very important. Let me explain.

Building SLs for applications that serve a growing number of users outside your company has a much broader impact, but also different considerations than building SLs for a BI solution used internally. In these scenarios, like for a sales and marketing team, internal BI with a few prepackaged reports and a data analyst to create additional reports on request is usually sufficient, but if you want to deliver analytics at massive scale, then SLs should be used.

Imagine that a hospitality ERP solution provider could offer a web portal and a mobile application with embedded analytics for hundreds of small- to medium-sized hotels and chains. The average hotel has dozens of different employees who use the analytics capabilities of the mobile app for tracking inventory and pricing in different areas of the hotel, and about 10 hotel managers who are heavy analytics users for optimizing the hotel’s running costs. In the future, the provider expands to the enterprise market and adds enterprise hospitality companies with thousands of locations worldwide. The provider’s average customer has grown to one that has thousands of employees and hundreds of managers using the embedded analytics capabilities—and the game has changed as a result.

How can users benefit from semantic layers?

When the sheer number of end-users who need to understand and interpret insights grows, it’s much easier to justify using SLs—especially when you consider that most broad-based analytics products serve business users that don’t know how to analyze or interpret data. These end-users may also have radically different needs, but still need a shared understanding of business measures. You can’t count on all of your customers agreeing on exactly the data that’s required at the beginning of your analytic application development project, whereas it’s much easier to get consensus from users in the small-scale BI dashboard world. And as those needs change, SLs ensure that the product’s evolution is consistently governed and controlled.

In addition, you may not be able to easily train each individual user on how to use your product or your product’s analytics features, or even on which actions to take based on the provided insights. Semantic layers are designed to make things simple and intuitive for business users by abstracting away the complexity of the underlying data source, reducing the need for training, and allowing information to be clearly understood.

Experience GoodData in Action

Discover how our platform brings data, analytics, and AI together — through interactive product walkthroughs.

Explore product tours

How do providers benefit from using semantic layers?

Because semantic layers act as an intermediary, they ensure that any changes made are implemented to all of your customers in a governed manner so a common understanding of how to interpret data persists as the product changes. And on a basic level, large-scale applications mean that many developers, data scientists, and insight creators are working collaboratively. Semantic layers serve as a common set of definitions for everyone, including developers, who work on the product.

Semantic layers are also useful for the demands of agile products, which must constantly evolve to stay relevant as customer needs change. That evolution occurs in the form of rapid product releases as analytics capabilities continue to improve. Semantic layers in large-scale apps act as ever-changing roadmaps that ensure that evolution is not just possible but also consistently governed and controlled.

And finally, semantic layers benefit providers by protecting data quality and your brand, because your customers expect the analytics you provide to be bulletproof. One wrong calculation or an error means losing trust in data—and your product—and could ultimately lead to churn or low usage and adoption of analytics.

What are best practices when working with semantic layers?

First, SLs should make interacting with data in your product simple, so start by understanding the end-user’s journey and designing for their experience, then focus on building out your semantic layers to meet their needs. Sometimes this exercise can lead to a tendency to build your semantic layers with total flexibility in mind, so the end-user can run any report, slicing and dicing in any way they want. However, be aware that for the regular user, this only serves to confuse rather than empower.

Imagine if LinkedIn gave users this amount of power. Most users wouldn’t touch it and would likely get overwhelmed and turn to another tool. Instead, LinkedIn provides just the curated information you need, and the definitions require zero explanation. My advice: Don’t give complete flexibility to end users. Build your app’s SLs for today’s needs, but allow for future extensibility for the next few releases.

Validate New Semantic Layers

Remember to validate the usability, quality, and accuracy of a new SL or modifications to an existing SL on different audiences early and often. Remember, tens of thousands of users with different needs expect the analytics you provide to be bulletproof. Reach out to real users to help evaluate prototypes or different versions of your product for different audiences. Then when you’re ready to make changes to the SLs, remember that you cannot simply switch off your application, so you will need automated release management to ensure quality control. Bear in mind that the 80/20 rule also applies in analytics applications, so design your SLs for the majority of your potential users.

Measure Semantic Layers’ Performance and Adoption

Make sure that you plan for and measure performance, because you’ll have thousands of users using your analytic application at the same time, and they won’t necessarily understand that the underlying analytics infrastructure will need some time to crunch the numbers on millions of records. To ensure superb performance, look at the adoption of analytical features in your application, identify where users are not getting the value from the insights they expect and maybe are even giving up, and then actually talk to those users about their pain points. Then use your development, staging, and beta environments to test and fine-tune the performance of your SLs.

Follow Good Governance and Monitoring Principles for Semantic Layers

Finally, follow good governance, versioning, and monitoring principles for semantic layers. Good SL governance results in a procedure, which specifies how to integrate measures and insights into the standard product to be used by other customers. Good SLs will not only record your analytics solution's dimensional model design, but will also serve as a “source of truth" for measure definitions as they are understood by your users. In addition, as part of your SL governance, you should monitor how SLs are used by your users. If the definition of any of the SL components changes, be sure to keep "snapshots" of how they’ve evolved over time.

Not all Semantic Layers are Created Equal

When you look under the hood, semantic layers are not a single unit, but a collection of sub-layers dependent on each other. For large-scale analytics projects, it is more important than ever to build deep semantic layers that create a common set of definitions, abstract complexity away from business people, and ensure users don’t make mistakes.

In a future blog post, I’ll talk more about the components of semantic layers, logical data models, measures, and insights and how they work together to create an amazing user experience. I’ll also dive into the details and the impact of application scale on each of these, so stay tuned.

Experience GoodData in Action

Discover how our platform brings data, analytics, and AI together — through interactive product walkthroughs.

Explore product tours

Share