A fact is a logical object that represents a numerical piece of data, which in business environments are used to measure business processes. For example, you may have data about the number of sold items, the price of the purchased goods, or the salary of individual employees.
Depending on your business goals, you may analyze raw facts (for example, present the salary of each individual employee) or aggregate the facts and analyze the resulting metrics (for example, sum salaries of the employees working for a specific department to see the total salary amount paid in this department).
Facts in datasets are identified by the following icon:
In your database, facts can be stored as any numerical data type, with or without floating point.
Facts can be of the following types:
- Additive facts can be used in computations, such as summing them together. Additive facts are sources for aggregation that is accomplished by a metric function.
- Non-additive facts cannot be added. For example,
Unit Pricescannot be added together to produce meaningful information. Non-additive facts can often be derived from other additive facts. For example,
Unit Pricecan be derived from dividing
- Semi-additive facts can be added but only within certain context, therefore they must be managed carefully. For example,
Inventory for a monthis additive only within a month.
Data modelers are mostly interested in additive facts.
A collection of facts that measure the same business process are stored in a single data unit called a fact dataset (also known as a fact table). A fact dataset contains individual fact values and pointers to associated attributes used as context for the fact data. Fact datasets have relatively few columns and many rows.
In some databases, denormalizing the relational model and creating very wide fact tables with many attributes may dramatically improve performance of some visualizations: slicing a fact by an attribute in the same dataset (table) means no JOIN operation is needed while querying your database.
However, some visualizations cannot be created and executed this way because they require values of the attribute for which no corresponding record exists in the fact table. Also, you cannot utilize dimensionality, because you cannot declare relations between datasets (sets of attributes) inside a single dataset.
To find out whether the denormalization method may be useful in your database, review the article about your database under Supported Databases.
All facts in a fact dataset are recorded with a certain level of detail. This level of detail is called the grain of a fact. The grain of a fact dataset is largely determined by the attribute references in it.
For example, if an inventory fact dataset contains the
Quantity fact and the
the grain of the dataset would be
GoodData supports only one-attribute grains.
Designing Fact Datasets
Depending on the type of the underlying business process, design fact datasets in one of the following ways:
Transactional fact dataset for discrete events
One row corresponds to a single transaction, usually connected to one date dataset. Rows are added, not updated.
Accumulating fact dataset for evolving events
This method models a process passing through several predefined stages, usually connected to multiple date datasets. One row corresponds to an execution of the process, with individual stages of the process tracked in separate columns. The rows must be updated throughout the stages.
Include the lowest level of source data in your facts. These facts can always be rolled up.
Example: If you measure a business process by individual city data, you can always roll up that information to the state level. However, if data is stored at the state level, you cannot decompose that data to the individual city level.
Where possible, be consistent in your units of measure before the data is inserted into the system.
Example: Do not store Fact 1 in inches and Fact 2 in centimeters. Even if you never intend to use these facts together, another user may decide to use both of them in a visualization.
Make sure that facts placed in the same fact table are at the same level of detail (grain) and from the same business process. If two events occur at different times, they belong in different fact tables.
Verify that you understand the level of detail (grain) in each fact dataset. In general, the complete set of attributes in a dataset indicates the level of detail.
Where possible, avoid including non-additive facts, which cannot be used in metric computations.
- Instead of storing a ratio or percentage, store the facts that can be used to calculate the percentage or ratio as a metric.
- Instead of storing the price of units (
units * unit price), store
unit priceseparately. Put the calculation of the price of the units or the average price into metrics.