Skip to main content
Version: main 🚧

Aggregating Metrics

This guide explains how to configure OpenTelemetry with Prometheus to collect workload metrics from across multiple virtual clusters and aggregate by Project, vCluster, and Space. This approach uses shared OpenTelemetry DaemonSets on the host cluster without requiring individual collector installations per virtual cluster.

Prerequisites​

Before you begin, decide where to deploy Prometheus and Grafana. This guide uses the Platform's connected local-cluster with both services in an observability namespace. You can adapt this to external clusters or different namespaces based on your requirements.

Don't use kube-prometheus-stack

Do not use the kube-prometheus-stack chart from the Platform Apps page for this setup. Instead, use the standalone prometheus and grafana charts as described below, which are optimized to work with the OpenTelemetry Collector.

Deploy Prometheus​

Deploy Prometheus with OTLP receiver support using the Platform Apps UI.

  1. Go to the Infra section using the menu on the left, and select the Clusters view.

  2. Click on the cluster where you want to deploy Prometheus (for example, local-cluster).

  3. Navigate to the Apps tab.

  4. Click and configure a Helm chart with the following settings.

SettingValue
Chart Repository URLhttps://prometheus-community.github.io/helm-charts
Chart Nameprometheus
Namespaceobservability
Release Nameprometheus

Use the following chart values to enable the OTLP receiver and required features:

Prometheus Helm values
# Enable OTLP receiver for OpenTelemetry metrics ingestion
# Enable delta-to-cumulative conversion for OTLP metrics
# Enable lifecycle API for configuration reloads
server:
extraFlags:
- web.enable-otlp-receiver
- web.enable-lifecycle
extraArgs:
enable-feature: otlp-deltatocumulative

# Skip RBAC creation if prometheus-server ClusterRole already exists
rbac:
create: false

# Disable components not needed for this setup
prometheus-node-exporter:
enabled: false
alertmanager:
enabled: false
kube-state-metrics:
enabled: false
prometheus-pushgateway:
enabled: false
Existing Prometheus installations

The rbac.create: false setting skips ClusterRole creation, which prevents conflicts if a prometheus-server ClusterRole already exists in your cluster from vCluster Platform or other components. If you don't have an existing ClusterRole, either remove this setting or use server.clusterRoleNameOverride: "otel-prometheus-server" to create one with a unique name.

Click to deploy Prometheus.

Deploy OpenTelemetry Collector​

Deploy the built-in OpenTelemetry App on each connected host cluster. This OpenTelemetry App accepts the Prometheus connection information and deploys opentelemetry-collector as a DaemonSet via Helm.

The OpenTelemetry Collector Agent on each node pushes metrics about the workloads running on that node to the Prometheus instance. The metrics include vCluster, vCluster Platform, and Kubernetes metadata as labels.

  1. Go to the Clusters dropdown using the menu on the left, and select the Clusters view.

  2. Click on the cluster where you are installing the OpenTelemetry Collector App.

  3. Navigate to the Apps tab.

  4. Click on the OpenTelemetry Collector App.

  5. Enter the Prometheus connection endpoint: http://prometheus-server.observability.svc.cluster.local:80

  6. Click on the button to finish.

Prometheus endpoint

If you deployed Prometheus in a different namespace or cluster, adjust the endpoint URL accordingly. The format is http://<release-name>-server.<namespace>.svc.cluster.local:80.

Deploy Grafana​

Deploy Grafana with a pre-configured Prometheus datasource and dashboard using the Platform Apps UI.

  1. Go to the Clusters dropdown using the menu on the left, and select the Clusters view.

  2. Click on the cluster where you deployed Prometheus.

  3. Navigate to the Apps tab.

  4. Click and configure a Helm chart with the following settings.

SettingValue
Chart Repository URLhttps://grafana.github.io/helm-charts
Chart Namegrafana
Namespaceobservability
Release Namegrafana

Use the following chart values to configure the Prometheus datasource and include a pre-built dashboard:

Grafana Helm values
# Configure Prometheus as the default datasource
datasources:
datasources.yaml:
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
url: http://prometheus-server
access: proxy
isDefault: true

# Configure dashboard provisioning
dashboardProviders:
dashboardproviders.yaml:
apiVersion: 1
providers:
- name: 'default'
orgId: 1
folder: ''
type: file
disableDeletion: false
editable: true
options:
path: /var/lib/grafana/dashboards/default

# Include a pre-built dashboard for vCluster Platform metrics
dashboards:
default:
vcluster-platform-metrics:
json: |
{"title":"CPU and Memory usage by Project, Space, Virtual Cluster","panels":[{"gridPos":{"h":8,"w":12,"x":0,"y":0},"id":3,"options":{"legend":{"calcs":[],"displayMode":"list","placement":"bottom","showLegend":true}},"targets":[{"disableTextWrap":false,"editorMode":"code","expr":"sum by(loft_virtualcluster_name) (k8s_pod_cpu_time_seconds_total{loft_virtualcluster_name=~\".+\"})","fullMetaSearch":false,"includeNullMetadata":true,"instant":false,"legendFormat":"__auto","range":true,"refId":"A","useBackend":false}],"title":"CPU Usage by Virtual Cluster","type":"timeseries"},{"gridPos":{"h":8,"w":12,"x":12,"y":0},"id":4,"options":{"legend":{"calcs":[],"displayMode":"list","placement":"bottom","showLegend":true}},"targets":[{"disableTextWrap":false,"editorMode":"code","expr":"sum by (loft_virtualcluster_name) (k8s_pod_memory_usage_bytes{loft_virtualcluster_name=~\".+\"})\n/1024/1024","fullMetaSearch":false,"includeNullMetadata":true,"instant":false,"legendFormat":"__auto","range":true,"refId":"A","useBackend":false}],"title":"Memory Usage (MiB) by Virtual Cluster","type":"timeseries"},{"gridPos":{"h":8,"w":12,"x":0,"y":8},"id":2,"options":{"legend":{"calcs":[],"displayMode":"list","placement":"bottom","showLegend":true}},"targets":[{"disableTextWrap":false,"editorMode":"builder","expr":"sum by(loft_project_name) (k8s_pod_cpu_time_seconds_total{loft_project_name=~\".+\"})","fullMetaSearch":false,"includeNullMetadata":true,"instant":false,"interval":"","legendFormat":"__auto","range":true,"refId":"A","useBackend":false}],"title":"CPU Usage by Project","type":"timeseries"},{"gridPos":{"h":8,"w":12,"x":12,"y":8},"id":1,"options":{"legend":{"calcs":[],"displayMode":"list","placement":"bottom","showLegend":true}},"targets":[{"disableTextWrap":false,"editorMode":"code","expr":"sum by (loft_project_name) (k8s_pod_memory_usage_bytes{loft_project_name=~\".+\"})\n/1024/1024","fullMetaSearch":false,"includeNullMetadata":true,"instant":false,"legendFormat":"__auto","range":true,"refId":"A","useBackend":false}],"title":"Memory Usage (MiB) by Project","type":"timeseries"},{"gridPos":{"h":8,"w":12,"x":0,"y":16},"id":6,"options":{"legend":{"calcs":[],"displayMode":"list","placement":"bottom","showLegend":true}},"targets":[{"disableTextWrap":false,"editorMode":"code","expr":"sum by(loft_space_name) (k8s_pod_cpu_time_seconds_total{loft_space_name=~\".+\"})","fullMetaSearch":false,"includeNullMetadata":true,"instant":false,"legendFormat":"__auto","range":true,"refId":"A","useBackend":false}],"title":"CPU Usage by Space","type":"timeseries"},{"gridPos":{"h":8,"w":12,"x":12,"y":16},"id":5,"options":{"legend":{"calcs":[],"displayMode":"list","placement":"bottom","showLegend":true}},"targets":[{"disableTextWrap":false,"editorMode":"code","expr":"sum by (loft_space_name) (k8s_pod_memory_usage_bytes{loft_space_name=~\".+\"})\n/1024/1024","fullMetaSearch":false,"includeNullMetadata":true,"instant":false,"legendFormat":"__auto","range":true,"refId":"A","useBackend":false}],"title":"Memory Usage (MiB) by Space","type":"timeseries"}],"schemaVersion":38}

Click to deploy Grafana.

Access Grafana​

After deploying Grafana, retrieve the admin password and access the dashboard.

Get the Grafana password​

Run the following command to retrieve the Grafana admin password:

kubectl get secret --namespace observability grafana \
-o jsonpath="{.data.admin-password}" | base64 --decode; echo

Option 1: Port forward​

Use port forwarding to access Grafana locally:

kubectl port-forward -n observability service/grafana 8080:80

Then open your browser and navigate to http://localhost:8080. Log in with username admin and the password from the previous step.

Option 2: Ingress​

Create an Ingress resource to expose Grafana externally. The following example uses the nginx ingress controller:

grafana-ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /
# Uncomment the following for TLS with cert-manager:
# cert-manager.io/cluster-issuer: letsencrypt-prod
# nginx.ingress.kubernetes.io/ssl-redirect: "true"
name: grafana
namespace: observability
spec:
ingressClassName: nginx
rules:
- host: grafana.example.com # Replace with your hostname
http:
paths:
- backend:
service:
name: grafana
port:
number: 80
path: /
pathType: Prefix
# Uncomment the following for TLS:
# tls:
# - hosts:
# - grafana.example.com
# secretName: grafana-ingress-tls

Apply the Ingress:

kubectl apply -f grafana-ingress.yaml

Navigate to your configured hostname and log in with the admin credentials.

Create Prometheus queries​

After collecting data with OpenTelemetry, you can aggregate the metrics using the associated labels like the vCluster name.

Here's an example Prometheus query showing the CPU usage aggregated by the vCluster name:

sum by(loft_virtualcluster_name) (k8s_pod_cpu_time_seconds_total{loft_virtualcluster_name=~".+"})

Available labels​

The OpenTelemetry Collector adds the following vCluster Platform labels to metrics:

LabelDescription
loft_project_nameThe vCluster Platform project name
loft_virtualcluster_nameThe virtual cluster name
loft_space_nameThe space name
loft_cluster_nameThe connected cluster name

Example queries​

Memory usage by project:

sum by (loft_project_name) (k8s_pod_memory_usage_bytes{loft_project_name=~".+"}) / 1024 / 1024

CPU usage by space:

sum by(loft_space_name) (k8s_pod_cpu_time_seconds_total{loft_space_name=~".+"})

Example dashboard​

The Grafana deployment includes a pre-built dashboard that visualizes CPU and Memory usage aggregated by vCluster, Project, and Space.

Grafana dashboard showing CPU and Memory usage by Project, Space, and Virtual Cluster

You can also import additional dashboards or create custom visualizations using the available labels.