Vertica

Data Source Details

Use the following information when creating a data source for your Vertica database:

  • The JDBC URL must be in the following format:

    jdbc:vertica://<host>:<port>/<databaseName>

  • Basic authentication is supported. Specify user and password.

  • If you use native authentication inside your cloud platform (for example, Google Cloud Platform, Amazon Web Services, or Microsoft Azure), you do not have to provide the username and password.

GoodData Cloud Native (GoodData.CN) uses the driver with version 10.0.1-0.

The following database versions are supported:

  • 9.x
  • 10.x

Performance Tips

If your database holds a large amount of data, consider the following practices:

  • Denormalize the relational data model of your database.

    This helps avoid large JOIN operations. Because Vertica is a columnar database, queries read only the required columns and each column is compressed separately.

  • Optimize projections .

    • RESEGMENT by the columns that are most frequently used for JOIN and aggregation operations. At least, RESEGMENT by a column with high cardinality so that loaded data is evenly distributed in your cluster.
    • SORT by the columns that are most frequently used for JOIN and aggregation operations. Those columns are typically mapped to attributes that are most frequently used for aggregations in insights.
    • Use column encoding . Specifically, use RLE encoding for low-cardinality columns (columns with few distinct values).
    • If you have to build analytics for multiple mutually exclusive use cases, define multiple projections on top of a table.
  • Utilize live aggregate projections .

  • Use hierarchical partitioning to avoid too many partitions (ROS containers) in a single projection.

  • Use Eon Mode - spin up sub-clusters based on user needs.

    • Users with similar needs populate data into EON depots that are likely reused.
    • Isolate data transformation operations running in your database from the analytics generated by GoodData.CN.
  • Scale up based on users needs. Automate adding and removing secondary sub-clusters.