Monitoring Kubernetes Workloads with Kuantifier¶

Workload jobs run via Kubernetes will not integrate with Gratia accounting by default. To report contributions to OSG made via Kubernetes, install the Kuantifier helm chart on your cluster.

Before Starting¶

Confirm access to a running Kubernetes cluster¶

All subsequent instructions assume you have administrative access to a running Kubernetes cluster, and can run kubectl against that cluster.

Install the Helm command line tools¶

Kuantifier itself, and several of its prerequisites, are installed via Helm chart. The Helm command line tools are used to install helm charts against a running kubernetes cluster, and can be installed as follows:

Download the latest release of Helm.
Unpack the release blob (eg. tar -zxvf helm-v3.0.0-linux-amd64.tar.gz).
Move the helm binary from the archive into a location along your $PATH (eg. mv linux-amd64/helm ~/.local/bin).

Install Prometheus and kube-state-metrics in your Kubernetes cluster¶

Kuantifier relies on Prometheus with kube-state-metrics to account for pod resource usage. There are a number of ways to install both, such as:

Via the prometheus community helm charts.
Via OpenShift user-defined project monitoring.

Ensure that the namespace where your workload pods run is properly configured¶

Kuantifier relies on the spec.containers[].resources.requests.cpu field in workload pods to determine processor count for GRACC reporting. Ensure a CPU request is set for pods in your workspace.
Confirm the method by which workload pods are launched in your namespace. By default, Kuantifier relies on kube-state-metrics' kube_pod_completion_time to calculate job run times, which is only reliably reported for pods launched by Kubernetes Jobs. Kuantifier provides a custom metric exporter to approximate the runtimes of pods launched by other means (such as JupyterHub), which must be enabled via configuration.
(Known issue) Kuantifier currently doesn't support calculating usage metrics for workload pods running multiple containers. Ensure that workload pods in your namespace have only one container.

Installation¶

Kuantifier itself is also installed via a Helm chart, hosted on OSG Harbor.

Configuring Kuantifier's Values File¶

Several instance-specific modifications to the default Values File provided with the chart must be made prior to installation. For full documentation of the values in the values file, see the Helm chart README on Github.

Fetch the default values.yaml for Kuantifier via the helm cli:
```
helm show values oci://hub.osg-htc.org/iris-hep/kuantifier
```
Update the top-level .outputFormat in values.yaml to output records to GRACC:
```
outputFormat: "gratia"
```
Update the .processor.config map with the details of your deployment.
- All of the following need to be set:
  - NAMESPACE: The namespace of the pods for which Kuantifier will collect and report metrics.
    
    Installation per Monitored Namespace
    
    Each installation of kuantifier only reports on pods in a single namespace. You must install multiple instances of the chart to support reporting on multiple namespaces.
  - SITE_NAME: The name of the site being reported.
  - SUBMIT_HOST: Uniquely identifying name for the Kubernetes cluster where your workload pods run, in FQDN format.
  - VO_NAME: Virtual Organization (VO) of jobs.
- Additionally, the following may need to be set:
  - PROMETHEUS_SERVER: The DNS name of the Prometheus server installed in your Kubernetes cluster.
    - If Prometheus was installed in your cluster via the prometheus-community Helm chart in the monitoring namespace, the DNS name will be prometheus-server.monitoring.svc.cluster.local
    - If Prometheus was installed via OpenShift, the DNS name for the cluster Prometheus instance can be discovered via the oc command line tool.
    - Otherwise, construct the URL based on the standard Kubernetes service discovery mechanism (i.e. service name and namespace).
- A fully configured .processor.config might look like:
```
processor:
  config:
    NAMESPACE: workload-namespace
    SITE_NAME: CHTC
    VO_NAME: University of Wisconsin
    SUBMIT_HOST: tiger-cluster.chtc.wisc.edu
    PROMETHEUS_SERVER: prometheus-server.monitoring.svc.cluster.local
```
(Optional) If Prometheus in your cluster is configured to require authentication, an authentication header can be specified via a key within an already-existing Service Account API Token in the namespace:
```
processor:
  prometheus_auth:
    secret: <service account secret name>
    key: token
```
Authentication

API Token-based authentication is required by default in OpenShift. Prometheus instances installed via the community helm charts are unauthenticated by default.
(Optional) Update the frequency of the Kuantifier Reporting job. This may be useful for debugging.
```
cronJob:
  schedule: "@daily"
```

(Optional) If reporting on Pods that are not launched via Kubernetes Jobs, enable the Kuantifier last-seen-time Prometheus metric exporter.
```
exporter:
  enabled: true
  config:
    POD_NAME_PREFIX: "jupyter-" # Prefix for identifying workload pods
```
Exporter Support

The Kuantifier pod endtime exporter was developed in support of accounting Pods launched by the JupyterHub operator. Usage with Pods launched via Deployments and StatefulSets is experimental.

Installing Kuantifier¶

After configuring an appropriate values file for your instance, install the chart via helm:

helm install -f <values.yaml> -n <install namespace> kuantifier oci://hub.osg-htc.org/iris-hep/kuantifier

Validation¶

After running helm install, ensure that the expected Kubernetes objects have been created. The following commands assume that Kuantifier has been installed in the monitoring namespace.

Check that a CronJob was created for running the Kuantifier processor:
```
kubectl -n monitoring get cronjob kuantifier-cronjob
```
Check that a ConfigMap was created to configure processor jobs, and that the values in the ConfigMap align with the values set in .processor.config in the values file:
```
kubectl -n monitoring get configmap kuantifier-processor-config -o yaml
```

If the Helm chart artifacts are present as expected, run a test instance of the CronJob and inspect its output.

Create a new job from the CronJob, then find the Pod created by the job:

kubectl -n monitoring create job --from=cronjob/kuantifier-cronjob kuantifier-test-job
kubectl -n monitoring get pod | grep kuantifier-test-job

Inspect the logs from the processor initContainer, which queries Prometheus to generate output records:
```
kubectl -n monitoring logs <test-job-pod-name> -c processor
```
Inspect the logs from the gratia-output container, which sends the output records to GRACC:
```
kubectl -n monitoring logs <test-job-pod-name> -c gratia-output
```

If both the processor initContainer and gratia-output container run to completion without error, the next step is to confirm that contributions from your site appear on the GRACC dashboard.

Getting Help¶

If you need help with configuring monitoring for your Kubernetes site, follow the contact instructions.