Upgrade Postgresql-ha Helm Chart for GoodData 2.1.0

If you’re using external PostgreSQL database (when you deployed gooddata-cn helm chart with option deployPostgresHA: false), you can skip this step because it is not relevant for your deployment.

If you’re using embedded PostgreSQL database (with deployPostgresHA: true set in values.yaml), you will need to perform the following process to upgrade postgresql-ha helm chart from version 8.6.13 to 9.1.5. The upgrade includes migration of PostgreSQL database version from 11 to 14. Due to the nature of the upgrade, this action will cause a service disruption. Please schedule maintenance window for this operation.

Important

All commands must end successfully. Any error must be addressed properly to avoid data loss. Get familiar with the procedure on test environment before applying it to production.
Always start the upgrade in an empty folder. Procedure creates helper files for rollback purposes. If two upgrade procedures are executed from the same folder, helper files from two environments mix up and rollback will not be possible without proper manual intervention.

Note

Procedure expects kubectl in version 1.23 or above because of parameter --retries of cp subcommand. It is possible to execute the procedure with older version. Just remove --retries and double-check complete data were transferred.

Steps:

If you have not done so already, create a /tmp/organization-layout.json JSON dump from your GoodData.CN 2.0.x organization layout. See Back Up the Organization.

Set up your shell environment, make sure you change the following values:

# namespace GoodData.CN is deployed to
export NAMESPACE=gooddata-cn
# name of helm release used to deploy GoodData.CN 
export HELM_RELEASE=release-name
# path to helm values file used to deploy GoodData.CN
export HELM_VALUES_FILE="values-gooddata-cn.yaml"
# postgres name as specified in values.yaml postgresql-ha.nameOverride
export PG_NAME=db
# PG-HA admin user as defined in values.yaml postgresql-ha.postgresql.username
export PGUSER=postgres
# PG-HA admin user password as defined in values.yaml postgresql-ha.postgresql.password  
export PGPASSWORD=$(cat pg_password.txt)
export PGHOST=${HELM_RELEASE}-${PG_NAME}-postgresql-0
# helm release name of temporary PG in destination version started to execute pg_dump
export TMP_PGDUMP_RELEASE=tmp-pg-dump
export TMP_PGDUMP_POD=${TMP_PGDUMP_RELEASE}-postgresql-0
# location of dumps in temporary container
export DUMP_LOCATION=/bitnami/postgresql/dumps

Disable access to your GoodData.CN application:

helm upgrade --namespace $NAMESPACE --version 2.0.1 \
  --wait --timeout 7m -f $HELM_VALUES_FILE \
  --set metadataApi.replicaCount=0 \
  --set sqlExecutor.replicaCount=0 \
  --set dex.replicaCount=0 \
  $HELM_RELEASE gooddata/gooddata-cn

Note that once the command finishes, users will see Internal Server Error message when trying to access any deployment organization.

Deploy a temporary container to dump your data into:

cat << EOT > /tmp/values-${TMP_PGDUMP_RELEASE}.yaml
auth:
  postgresPassword: dumpdata
primary:
  persistence:
    enabled: false
readReplicas:
  replicaCount: 0
EOT

helm upgrade --install --namespace $NAMESPACE --version 11.6.6 \
  --wait --timeout 2m --values /tmp/values-${TMP_PGDUMP_RELEASE}.yaml \
  ${TMP_PGDUMP_RELEASE} bitnami/postgresql

Enable network access between gooddata-cn-db-pgpool and tmp-pg-dump-postgresql:

cat << EOT > /tmp/network-access-tmp-pg-2-prod-pg.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  namespace: $NAMESPACE
  name: $TMP_PGDUMP_RELEASE-ingress
spec:
  ingress:
    - from:
      - namespaceSelector:
          matchLabels:
            kubernetes.io/metadata.name: $NAMESPACE
        podSelector:
          matchLabels:
            app.kubernetes.io/instance: $TMP_PGDUMP_RELEASE
            app.kubernetes.io/name: postgresql
      ports:
        - port: 5432
          protocol: TCP
  podSelector:
    matchLabels:
      app.kubernetes.io/component: postgresql
      app.kubernetes.io/instance: $NAMESPACE
      app.kubernetes.io/name: $PG_NAME
  policyTypes:
    - Ingress
EOT

kubectl apply -f /tmp/network-access-tmp-pg-2-prod-pg.yaml

List available databases in the postgres-ha deployment:
```
kubectl -n $NAMESPACE exec $PGHOST -- env PGPASSWORD=$PGPASSWORD psql -U postgres -c "\l" 
```
Pick databases to be preserved. Use their names as input for variable USER_DBS_TO_TRANSFER in the next step. Note that:
- databases md, execution and dex are always preserved
- databases template0 and template1 must be always skipped

Dump databases that you want preserved, edit the USER_DBS_TO_TRANSFER and USER_ROLES_BY_PG_HA values:

# space separate list of user DBs to be dumped, system DBs md, execution and dex are included automatically
export USER_DBS_TO_TRANSFER="tigerdb"
# space separated list of user-defined PG roles delivered by postgresql-ha helm chart;
# the roles will be excluded from roles dump as they are created automatically during PG-HA ecosystem provisioning;
# roles repmgr, postgres and executor are excluded automatically
export USER_ROLES_BY_PG_HA=""

cat << "EOT" > ./dump-pg-dbs.sh
#!/bin/bash
set -x
set -e

PGHOST_IP=$(kubectl get pod -n $NAMESPACE $PGHOST --template '{{.status.podIP}}')   

kubectl -n $NAMESPACE exec $TMP_PGDUMP_POD -- mkdir -p $DUMP_LOCATION
# exclude all the roles created automatically by:
#   - PG-HA chart 
#   - postgres installation
#   - GD chart
time kubectl exec -it -n $NAMESPACE $TMP_PGDUMP_POD -- bash -c "env PGPASSWORD=$PGPASSWORD \
  pg_dumpall -h $PGHOST_IP -U $PGUSER --roles-only > /tmp/0_dump_pg_roles_all.sql"
ROLES_TO_EXCLUDE="repmgr postgres executor"
if [[ "$USER_ROLES_BY_PG_HA" != "" ]]; then
  ROLES_TO_EXCLUDE="${ROLES_TO_EXCLUDE} $USER_ROLES_BY_PG_HA"
fi
ROLES_TO_EXCLUDE=$(echo $ROLES_TO_EXCLUDE | tr ' ' '|')
kubectl exec -n $NAMESPACE $TMP_PGDUMP_POD -- bash -c "grep -i -v \
  -E \"^(CREATE|ALTER) ROLE (${ROLES_TO_EXCLUDE})\" \
  /tmp/0_dump_pg_roles_all.sql | gzip > $DUMP_LOCATION/0_dump_pg_roles.sql.gz"

# dump selected databases
ALL_DBS_TO_TRANSFER="md execution dex ${USER_DBS_TO_TRANSFER}"
ITER_ID=1
for db in $ALL_DBS_TO_TRANSFER; do
  DUMP_DEST_FILE="$DUMP_LOCATION/${ITER_ID}_dump_pg_db_${db}.sql.gz"
  echo "Creating dump of DB ${db} to file ${DUMP_DEST_FILE}"
  time kubectl exec -it -n $NAMESPACE $TMP_PGDUMP_POD -- bash -c "env PGPASSWORD=$PGPASSWORD \
    pg_dump -h $PGHOST_IP -U $PGUSER --quote-all-identifiers --create ${db} | gzip > $DUMP_DEST_FILE"
  ITER_ID=$(($ITER_ID+1))
done
EOT

chmod 754 dump-pg-dbs.sh
./dump-pg-dbs.sh

The script first dumps only roles, those are PostgreSQL instance level definitions. After that, databases are dumped one by one.

Before moving forward, verify that temporary POD contains all the requested dumps. It means:

3 dumps (files) for md, dex and execution databases
1 dump for roles
1 dump for each database specified in USER_DBS_TO_TRANSFER variable

kubectl exec -n $NAMESPACE $TMP_PGDUMP_POD -- bash -c "ls -la $DUMP_LOCATION"

(Optional) Download dumps to local machine for backup:

Note

This step requires kubectl in version 1.23 or above. The version 1.23 adds parameter --retries

kubectl exec -n $NAMESPACE $TMP_PGDUMP_POD -- bash -c "cd $DUMP_LOCATION/../; tar cf dumps.tar dumps/*"
kubectl cp -n $NAMESPACE --retries -1 $TMP_PGDUMP_POD:/bitnami/postgresql/dumps.tar ./dumps.tar

# verify, data were transferred completely - compare MD5 hash
kubectl exec -n $NAMESPACE $TMP_PGDUMP_POD -- md5sum $DUMP_LOCATION/../dumps.tar
md5sum ./dumps.tar

Remove postgresql-ha:

helm upgrade --namespace $NAMESPACE --version 2.0.1 \
  --wait --timeout 2m -f $HELM_VALUES_FILE \
  --set metadataApi.replicaCount=0 \
  --set sqlExecutor.replicaCount=0 \
  --set dex.replicaCount=0 \
  --set deployPostgresHA=false \
  $HELM_RELEASE gooddata/gooddata-cn

Remove persistent volume claims and backup related persistent volumes:

cat << "EOT" > ./backup-pvs.sh
#!/bin/bash
set -x
set -e

PV_NAME_BACKUP_PREFIX=pv_name_backup
PVC_BACKUP_PREFIX=pvc_bck
PVC_PV_LINES=$(kubectl get pvc -n $NAMESPACE --sort-by=.metadata.name \
--selector="app.kubernetes.io/component=postgresql,app.kubernetes.io/instance=${HELM_RELEASE},app.kubernetes.io/name=${PG_NAME}" \
-o jsonpath='{range .items[*]}{@.metadata.name}{" "}{@.spec.volumeName}{"\n"}{end}')

echo "$PVC_PV_LINES" | while read -r line; do
  PVC_INSTANCE=$(echo $line | cut -f 1 -d ' ')
  PV_INSTANCE=$(echo $line | cut -f 2 -d ' ')

  # make sure PV is not deleted by k8s after PVC is removed
  echo "Setting Retain policy for PV=${PV_INSTANCE}"
  kubectl patch pv "$PV_INSTANCE" -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}'

  # remember PV name for rollback purposes
  PV_NAME_BACKUP_FILE="${PV_NAME_BACKUP_PREFIX}_${PV_INSTANCE}"
  echo "Creating PV name backup as $PV_NAME_BACKUP_FILE"
  touch $PV_NAME_BACKUP_FILE

  # Backup PVC definition for rollback purposes
  PVC_BCK_FILE="${PVC_BACKUP_PREFIX}_$PVC_INSTANCE.yaml"
  echo "Creating PVC backup file $PVC_BCK_FILE"
  kubectl get pvc -n $NAMESPACE $PVC_INSTANCE -o yaml > $PVC_BCK_FILE

  # delete PVC
  echo "Deleting PVC $PVC_INSTANCE"
  kubectl delete pvc -n $NAMESPACE $PVC_INSTANCE
done
EOT

chmod 754 backup-pvs.sh
./backup-pvs.sh

Upgrade chart to 2.1.0, this includes upgrade of postgresql-ha chart to 9.x.x:

helm upgrade --namespace $NAMESPACE --version 2.1.0 \
  --wait --timeout 7m -f $HELM_VALUES_FILE \
  --set metadataApi.replicaCount=0 \
  --set sqlExecutor.replicaCount=0 \
  --set dex.replicaCount=0 \
  --set deployPostgresHA=true \
  $HELM_RELEASE gooddata/gooddata-cn

Verify postgres DB version:

kubectl exec -n $NAMESPACE $PGHOST -c postgresql -- psql --version

Restore DB data:

cat << "EOT" > ./restore-data.sh
#!/bin/bash
set -x
set -e

PGHOST_IP=$(kubectl get pod -n $NAMESPACE $PGHOST --template '{{.status.podIP}}')   
DUMP_FILES=$(kubectl exec -n $NAMESPACE $TMP_PGDUMP_POD -- ls $DUMP_LOCATION)

for dump_file in $DUMP_FILES; do
   echo "Restoring dump ${dump_file}"
   time kubectl exec -it -n $NAMESPACE $TMP_PGDUMP_POD -c postgresql -- bash -c "\
     gzip -cd $DUMP_LOCATION/$dump_file | \
     env PGPASSWORD=$PGPASSWORD psql -h $PGHOST_IP -U $PGUSER"
done
EOT

chmod 754 restore-data.sh
./restore-data.sh

Verify that all databases exist in a PostgreSQL instance:

kubectl -n $NAMESPACE exec $PGHOST -- env PGPASSWORD=$PGPASSWORD psql -U postgres -c "\l"

Enable application:

 helm upgrade --namespace $NAMESPACE --version 2.1.0 \
   --wait --timeout 7m -f $HELM_VALUES_FILE \
  $HELM_RELEASE gooddata/gooddata-cn

Test the GoodData.CN deployment:
- Is it possible to log in?
- Does dashboard reports compute?

Remove network policies:

kubectl delete -f /tmp/network-access-tmp-pg-2-prod-pg.yaml

Remove temporary PG deployed to create DB dumps:
```
helm uninstall --namespace $NAMESPACE $TMP_PGDUMP_RELEASE
```
Important
This operation drops all pg_dump backups. If step (8) was skipped backups will be lost.

Rollback

In case something goes wrong, you can use persistent volumes backed in step 7 to revert back to version 2.0.1.

Steps:

Disable access to the application and remove postgresql-ha:

 helm upgrade --namespace $NAMESPACE --version 2.1.0 \
   --wait --timeout 2m -f $HELM_VALUES_FILE \
   --set metadataApi.replicaCount=0 \
   --set sqlExecutor.replicaCount=0 \
   --set dex.replicaCount=0 \
   --set deployPostgresHA=false \
  $HELM_RELEASE gooddata/gooddata-cn

Remove PVCs and backup new PVs:

cat << "EOT" > ./remove-new-pvc.sh
#!/bin/bash
set -x
set -e

PV_NAME_BACKUP_PREFIX=pv14_name_backup
PVC_BACKUP_PREFIX=pvc14_bck
PVC_PV_LINES=$(kubectl get pvc -n $NAMESPACE --sort-by=.metadata.name \
--selector="app.kubernetes.io/component=postgresql,app.kubernetes.io/instance=${HELM_RELEASE},app.kubernetes.io/name=${PG_NAME}" \
-o jsonpath='{range .items[*]}{@.metadata.name}{" "}{@.spec.volumeName}{"\n"}{end}')


echo "$PVC_PV_LINES" | while read -r line; do
  PVC_INSTANCE=$(echo $line | cut -f 1 -d ' ')
  PV_INSTANCE=$(echo $line | cut -f 2 -d ' ')

  # make sure PV is not deleted by k8s after PVC is removed
  echo "Setting Retain policy for PV=${PV_INSTANCE}"
  kubectl patch pv "$PV_INSTANCE" -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}'
  PV_NAME_BACKUP_FILE="${PV_NAME_BACKUP_PREFIX}_${PV_INSTANCE}"

  # remember PV name for possible investigation purposes
  echo "Creating PV name backup as $PV_NAME_BACKUP_FILE"
  touch $PV_NAME_BACKUP_FILE

  # Backup PVC definition for rollback purposes
  PVC_BCK_FILE="${PVC_BACKUP_PREFIX}_$PVC_INSTANCE.yaml"
  echo "Creating PVC backup file $PVC_BCK_FILE"
  kubectl get pvc -n $NAMESPACE $PVC_INSTANCE -o yaml > $PVC_BCK_FILE

  # delete PVC
  echo "Deleting PVC $PVC_INSTANCE"
  kubectl delete pvc -n $NAMESPACE $PVC_INSTANCE
done
EOT

chmod 754 remove-new-pvc.sh
./remove-new-pvc.sh

Restore original PVCs and bind them to PV backups:

cat << "EOT" > ./restore-orig-pvc.sh
#!/bin/bash
set -x
set -e

PV_NAME_BACKUP_PREFIX=pv_name_backup
PVC_BACKUP_PREFIX=pvc_bck

# prepare original PVs to be joined by PVCs
for bck_pv in $(ls ${PV_NAME_BACKUP_PREFIX}*); do
  PV_INSTANCE=${bck_pv#"${PV_NAME_BACKUP_PREFIX}_"}
  echo "Making PV ${PV_INSTANCE} available"
  kubectl patch pv "$PV_INSTANCE" --type json -p '[{"op": "remove", "path": "/spec/claimRef"}]'
  echo "Setting Retain policy for PV=${PV_INSTANCE} to delete"
  kubectl patch pv "$PV_INSTANCE" -p '{"spec":{"persistentVolumeReclaimPolicy":"Delete"}}'
done

# restore original PVCs
for bck_pvc in $(ls ${PVC_BACKUP_PREFIX}*); do
  PVC_INSTANCE=${bck_pvc#"${PVC_BACKUP_PREFIX}_"}
  echo "Creating PVC ${PVC_INSTANCE}"
  kubectl create -f $bck_pvc
done
EOT

chmod 754 restore-orig-pvc.sh
./restore-orig-pvc.sh

Install application in the original version:

 helm upgrade --namespace $NAMESPACE --version 2.0.1 \
   --wait --timeout 2m -f $HELM_VALUES_FILE \
   --set metadataApi.replicaCount=0 \
   --set sqlExecutor.replicaCount=0 \
   --set dex.replicaCount=0 \
   --set deployPostgresHA=true \
  $HELM_RELEASE gooddata/gooddata-cn

Verify postgres DB version:

kubectl exec -n $NAMESPACE $PGHOST -c postgresql -- psql --version

Enable application:

 helm upgrade --namespace $NAMESPACE --version 2.0.1 \
   --wait --timeout 7m -f $HELM_VALUES_FILE \
  $HELM_RELEASE gooddata/gooddata-cn

Test the GoodData.CN deployment after rollback:
- Is it possible to log in?
- Does dashboard reports compute?

Upgrade Guides