Skip to main content
Version: main 🚧

Troubleshooting External Database

info
This feature is available from the Platform version v4.8.0
Modify the following with your specific values to replace on the whole page and generate copyable commands:

Platform pods are crashlooping​

Check pod logs for errors:

kubectl --context arn:aws:eks:us-east-1:123456789012:cluster/platform-ha \
logs -n vcluster-platform -l app=loft --tail=50

Common causes:

  • Database connectivity errors: Verify that VPC peering is active and routes exist on the correct route tables. Include public route tables if nodes are in public subnets. Confirm the database security group allows port 3306 from the cluster VPC CIDR.
  • IAM authentication failures: Confirm the Pod Identity association exists, the Pod Identity Agent add-on is ACTIVE, and the RDSIAMAuthKine IAM policy contains the correct DbiResourceId.
  • Resource exhaustion: Check node resource usage with kubectl top nodes and pod resource usage with kubectl top pods -n vcluster-platform.

Leader election is stuck​

If the platform is unresponsive but pods are running, the leader lease may be stuck. Check the lease status:

kubectl --context arn:aws:eks:us-east-1:123456789012:cluster/platform-ha \
get lease -n vcluster-platform -o wide

If the lease holder is a pod that no longer exists, restart the deployment to force a new leader election:

kubectl --context arn:aws:eks:us-east-1:123456789012:cluster/platform-ha \
rollout restart deployment/loft -n vcluster-platform

ALB health checks failing​

Check the ingress status:

kubectl --context arn:aws:eks:us-east-1:123456789012:cluster/platform-ha \
get ingress -n vcluster-platform loft

Verify the ALB target group health in the AWS console or with:

Modify the following with your specific values to generate a copyable command:
aws elbv2 describe-target-health \
--target-group-arn arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/k8s-vcluster-loft-xxxxxxxxxx/xxxxxxxxxx

Common causes:

  • The platform pods are not ready (check kubectl get pods).
  • The /healthz endpoint is not reachable from the ALB (check security groups and the ALB target type matches the service configuration).

Database connection timeouts​

If pods start but log database connection errors:

  1. Verify VPC peering is active:
aws ec2 describe-vpc-peering-connections \
--filters "Name=status-code,Values=active" \
--region us-east-1 \
--query 'VpcPeeringConnections[].{ID:VpcPeeringConnectionId,Requester:RequesterVpcInfo.VpcId,Accepter:AccepterVpcInfo.VpcId}'
  1. Verify routes from the EKS VPC to the database VPC exist:
Modify the following with your specific values to generate a copyable command:
aws ec2 describe-route-tables \
--filters "Name=vpc-id,Values=vpc-yyyyyyyyy" \
--region us-east-1 \
--query 'RouteTables[].Routes[?DestinationCidrBlock==`10.1.0.0/16`]'
  1. Test connectivity from within the cluster:
Modify the following with your specific values to generate a copyable command:
kubectl run dbtest --image=busybox --restart=Never --rm -it -- \
nc -zv mariadb-ha-platform.xxxxxxxxxxxx.us-east-1.rds.amazonaws.com 3306

Kerberos authentication failures​

When the platform authenticates to a PostgreSQL backend over Kerberos, the GSSAPI handshake output appears in the platform pod logs. Check the current logs with:

kubectl --context arn:aws:eks:us-east-1:123456789012:cluster/platform-ha \
logs -n vcluster-platform -l app=loft --tail=100

If the pod restarted, check the previous container logs as well:

kubectl --context arn:aws:eks:us-east-1:123456789012:cluster/platform-ha \
logs -n vcluster-platform -l app=loft --tail=100 --previous

Common causes:

  • KDC_ERR_C_PRINCIPAL_UNKNOWN: The KDC has no such principal. The principal in the datasource does not match the one exported into the mounted keytab, or it was never created in the KDC. Confirm both name and realm.
  • no GSSAPI provider registered: The platform build predates GSSAPI support. Upgrade to v4.10.0 or later.
  • Handshake fails despite a valid principal: The datasource host does not match the host in the PostgreSQL service principal (postgres/<host>@REALM) baked into the server keytab. Make the datasource host and the service principal host identical, or disable DNS canonicalization in krb5.conf (dns_canonicalize_hostname = false, rdns = false).
  • gss authentication failed from PostgreSQL: The pg_hba.conf rule did not map the principal to a login role, or PostgreSQL matched a broader host rule before the Kerberos rule. Verify the gss rule appears before broader rules, the role name matches the principal short name, and include_realm=0 and krb_realm are set correctly.
  • TLS or certificate errors: Kerberos authenticates the database connection, but TLS is configured separately. Verify the datasource sslmode and any database CA or client certificate values match your PostgreSQL server requirements.