Data stores in Large-scale Insight Applications
Written by Pavel Kolesnikov |
In my role as Director of Product Management, I spend a lot of time talking to current or potential B2B customers about the GoodData platform and embedded analytics and large-scale insight applications. In those conversations, I often encounter a common misconception: that a customer can just buy a powerful database and implement a visualization tool, and then insights will magically appear, at scale, in an easy-to-use web page.
This is just not the case. Turning massive amounts of complex data into data products that are useful and intuitive for casual business users is a tall order, both for organizational reasons and for technical reasons, but those executives tasked with making decisions regarding analytical applications may not be familiar enough with the details to understand why.
The costs of zero tuning scalability and schema-on-read
Every popular misconception is half true, and this is no exception. Vendors and industry analysts frequently speak about schema-on-read and querying data at the source, where the source may be a data lake or a data warehouse.
This is great if we’re talking about a system for a team of data analysts who maintain dashboards for the C-suite and do ad hoc analysis. In this case, it makes economic sense to spend a fraction of your analysts’ salary on a distributed query technology that will search terabytes of data with brute force like a charm.
But is this approach economically viable when we’re talking about a B2B analytical data product that will be used by hundreds—or thousands—of your enterprise customers and even more end users? Do you really want to pay to enable every end user to apply brute force to quickly search through terabytes of your data lake? And you may only have 10 customers producing a few GB of data now, but can you be sure this is how things will look a year from now?
If you want to enable your end users to create and share custom insights, then things get even more complicated when you consider the need to customize your product with custom fields.
Let’s consider a small- or medium-sized company with a five-person data warehousing team that handles everything data-related. Typically, this team is probably working on building a data warehouse or data lake, or they do internal reporting and analytics. Then one day, the product managers decide that—understandably—this company should be making use of the data it’s collecting, and they ask the data warehousing team to build an analytical application. While there is little doubt that the data warehousing team may be able to pull this off in the short term, this is not sustainable in the long term.
First, chances are that this data warehousing team is already operating at pretty close to its maximum capacity. Building an entirely new data product from the ground up would require months or even years of work on top of—or taken away from—the team’s existing responsibilities unless the company invests in finding and hiring new team members to help carry the load.
As the company grows, the data team also quickly becomes overwhelmed with new responsibilities: fighting data growth, consolidating new data sources, dealing with compliance requirements and new internal requests. In addition to these increasing core responsibilities, requests for new customer-facing product features will surface, and sooner or later the data team will discover that these are two totally different projects with very different requirements, necessary skill sets, and release cycles. It’s just not something that should be done by one team using one technology.
The right tool for the right job
Defining an external data product in the same environment—and with the same team— that you use for your internal BI may be tempting for smaller teams, but it can be a nightmare for rapidly growing or large organizations.
That’s why embedded analytics platforms like GoodData’s feature data preparation and distribution tooling that has been designed to start small and scale as your needs change. To start, all you need to do is describe your data sources, map the data to customer-specific analytical environments called workspaces, and you’re ready to rock.
Are your customers coming to you with new analytical requirements that bring chaos into your data warehouse design? Offload customer-specific transformation to our platform. Are you switching to messaging data integration architecture where your apps produce messages and your warehouse is the main consumer? Your analytical data producer can become yet another consumer in this architecture with no dependency on the data architecture team. Now instead of sinking time and money into expanding a data team to build an application from the ground up, the data team can keep doing the important work that they’ve been doing, and the company can still reap the rewards of using embedded analytics.
Written by Pavel Kolesnikov |
Subscribe to our newsletter
Get your dose of interesting facts on analytics in your inbox every month.Subscribe