Cache Management

GoodData creates data caches to store queried data and to optimize performance. When GoodData queries a data source, the latency and concurrency depends on what data source technology you are using. However when querying caches, we can support any data volume with very low latency and very high concurrency, up to the memory resource limit set in our cache storage.

Overview of Caching Architecture

Note that after a new Extract, Load, Transform (ELT) is executed on the source data, you need to invalidate existing caches to ensure that the updated data is queried by GoodData. Refer to the Cache Invalidation section below for more information.

GoodData makes use of up to three data caches:

Final Query Results

Processed raw query results that have been sorted, pivoted and paginated. This cache is used to display data in your visualizations.

You can also access this cached data directly using the /api/v1/actions/workspaces/{workspaceId}/execution/afm/execute API endpoint to get resultId which can then be browsed using the /api/v1/actions/workspaces/{workspaceId}/execution/afm/execute/result/{resultId} API endpoint.

Raw Query Results

This cache stores the raw result of a query before any processing is applied. These results are not paginated, all rows are aggregated together up to the platform limit.

The raw results cache is used to recompute the final results cache when you perform certain actions with the data; For example inside visualizations, when you change how the data is sorted or when you pivot the data by adding an attribute into columns.

Pre-aggregation (Beta Feature)

The pre-aggregation cache consists of database tables that are created in the data source itself. This cache may be used to speed up the calculation of certain types of metrics that make use of aggregation functions.

When pre-aggregation is disabled, GoodData transforms the execution of each visualization into one SQL statement and caches the result in GoodData cache storage. When pre-aggregation is enabled, GoodData transforms the execution of each visualization into multiple SQL statements and stores each as table in the cachePath schema in the data source. These tables can be reused by other visualizations.

Pre-aggregation is a beta feature that is turned off by default. If you want to try it out in your non-production environment, see Enable Pre-aggregation Caching.

Cache Invalidation

Whenever an ELT finishes, you need to perform cache invalidation to dump the outdated stored result and query new data. To invalidate a cache, see Invalidate Cached Data - the API endpoint updates the dataSource.uploadId attribute is changed when this happens. Caches with an outdated dataSource.uploadId attribute are dumped.

Overview of recaching

Caches have no time to live (TTL) expiration set, they are evicted based on the least recently used (LRU) cache replacement policy. Only keys registered during executions have 180 second TTL to prevent no-result-forever situations, if the underlying async execution gets stuck.

Metadata about pre-aggregation tables in data sources are stored in Postgres and they are further utilized in garbage collection process.