Written by Sandra Suszterova |
In order for data to be consumed in meaningful dashboards and visualizations, it must travel through various stages of a larger system. What does this entire process look like, start to finish? How is data in its raw form transformed into insights that lead to business decisions? These questions can be answered through data architecture.
Data architecture refers to the model, rules, and standards that document all of an organization’s data assets. It maps the data flow between these assets and the entire enterprise system. These data assets consist of data models, databases, documents, pictures, and spreadsheets, which carry data that is relevant to the organization's value chain, or which contain strategic or operational worth.
Data Architecture Overview
A successful data architecture system guarantees that data is correctly maintained and supports a company's business needs. To achieve this goal, we need to understand how the components of data architecture work together as well as implement certain principles or standards to ensure success
In this article we will focus on modern data architecture of cloud environments. Let’s take a look at its basic schema:
The databases, applications, and tools you may be familiar with all make up different components of the larger data architecture system. What is the purpose of each individual component and how do they interact with one another? Let’s break it down piece by piece.
Components of Data Architecture
The main components of a data architecture schema include:
- Data sources where data is gathered and stored
- A data warehouse platform where data is transferred to via data pipelines
- BI tools which consume the data
Data sources are the various resources and forms in which companies store their data. These can include relational databases, spreadsheets, flat files, PDFs, and documents. This data can be collected from enterprise applications (e.g., CRM, ERP, LOB), separate databases, or third-party vendors that conduct research for a given company. The company must determine which of these data sources are relevant for their use case before converting and transferring their data to a data warehouse — a centralized data store. We’ll elaborate on this in a moment.
The data from the various data sources can now be moved through data pipelines into a centralized data warehouse. This stage of the processes can include initial data loading, continuous and extendable data processing, and data change capture, which enables the detection and replication of changes in the underlying data source to the data warehouse.
Data Warehouse Platform
As previously mentioned, data warehouses are the location where data is stored after being transformed and moved by data pipelines. Once the data has reached the warehouse, it undergoes a two-step process:
- Staging layer: The purpose of this layer is to collect data from different data sources. It often replaces data lakes containing raw or unprocessed data. Data scientists and specific tools will help users to understand and translate this data to generate insights for various business purposes.
- Modeling layer: This layer contains data that has already been transformed and is stored in central data warehouses, where separate data marts are created with relevant data for specific business areas. This data is then analyzed using various BI tools by both technical and non-technical users.
In a modern data architecture schema, transformation processes take place between the staging and modeling layer. These data integration processes are designed to collect data from source systems, consolidate the data, and load it into data warehouses or data lakes.
BI and Analytics Tools
Lastly, data is moved to the end user’s BI and analytics tools in order to analyze it and extract meaningful value. Creating dashboards and insights as well as adding customized metrics is possible not only through BI and analytics tools via interactive data visualizations, but also through APIs and written code.
Data Architecture Principles
Interested in learning how to successfully build a modern data architecture system? We’ve identified some key principles of successful data architecture systems to help you to understand the essentials of creating data architecture. By keeping your data architecture (and each individual component) persistent, clean, and accountable, you can improve your organization’s comprehensive data strategy.
Here are the main principles:
Data Sharing and Open Access
Both internal and external users must be able to access relevant data when needed and share it among their related team members or departments.
Open access ensures that users have regular access to data without restrictions, which could be additional requirements for granting access or specific role ownership. Providing developers with appropriate permissions and self-service capabilities to view and modify the data architecture can help your team mitigate issues as they come up and continue working without interruption, rather than waiting for IT to resolve the issue.
Everyone working with your data and its architecture must agree on the terminology that will be used to describe items ranging from the main architectural components to individual metrics. Agreeing on a common terminology will ensure that users across teams and departments can communicate effectively and prevent misunderstandings in data usage.
Security and Access Control
Data is a valuable asset that requires high security against theft and abuse. However, data intended for different business use cases require varying levels of security.
Just as it’s important to ensure that users have appropriate access to data and the organization’s data architecture, it’s equally vital to establish adequate security measures in your organization. This can include creating backups of data and recovery plans, as well as setting up appropriate authentication factors for users working with sensitive customer data. Your security team should be equipped to quickly scale and determine which data requires multilevel security as well as assign appropriate permissions to each user so that your data remains protected.
Your data will need to be accessed and analyzed by both technical and non-technical users for different use cases. These users should be equipped to interact with data sources using direct queries and commands. Data consumption techniques may include the use of API for web services, OLAP for business intelligence, and SQL for data analysts.
Flexibility, Scalability and Agility
The ideal criteria that a contemporary data architecture system must achieve are flexibility, scalability and agility. The architecture should be adaptable enough to avoid emergencies, allowing for quick reactions to external changes or user base growth.
A multi-tenant environment can enable the effective and efficient scaling of user base numbers and enterprise data resources without creating duplicates of data sources.
Scalability and flexibility foster agility and allow organizations to quickly adapt to market changes without significant limitations on business performance.
Data architecture aligns integration, quality improvement, and successful data delivery. Generally, it helps organizations to:
- Understand the process of data collection and how data flows within an organization
- Align future business decisions with an underlying data strategy
- Guide developments across the entire system — from report creation to application integration and data warehouse strategy
Data architecture combines standards, principles, guidelines, and processes that help organizations to comprehend data assets and create a framework for transforming data into meaningful information.
Ready to learn more?
If you’d like to learn more about data architecture, check out our GoodData Architecture Overview e-book. To see how GoodData can complement your organization’s data architecture, request a demo with one of our experts today.
Written by Sandra Suszterova |
Subscribe to our newsletter
Get your dose of interesting facts on analytics in your inbox every month.Subscribe