A look at Retina on AKS
Hello all!
As you may have noticed, the Kubernetes landscape is steadily evolving (again) with the adoption of eBPF. We talked about eBPF in other articles, mainly about Cilium CNI and its feature. However, since eBPF is this big, Isovalent people are definitly not the only one working on it. There’s a Microsoft OSS project named Retina that aims to provide Kubernetes Monitoring and that also leverage this technology. In this article, We’ll have a look at Retina in an AKS environment.
Our agenda will be as follow
- About Retina
- Preparing the lab
- What can we do with Retina
Let’s get started!
1. About Retina
As mentioned in the intro, Retina is an Open Source Software project, proposed by Microsoft and aiming to better the Network Observability in the Kubernetes landscape.
There is a dedicated web site for it, in addition to all the mention that are already available in the Azure documentation. As can be read on this site, tehre are 2 features in Retina.
The first one is Metrics, which provide continuous observability on inbound & outbound traffic, dropped packets, API server latency, DNS, Node or interface statistics. About those metrics, we can leverage either the basic metrics, which limits itself to aggregated metrics by node, and advanced metrics which provides additional metrics related to source and destination pod. Those metrics are collected through eBPF for linux nodes. It’s interesting to note that Retina is also working for Windows nodes, and in this case, it relies on other technologies. Specifically for the metrics part, the mentionned technology is VFP, whic seems to refer to Virtual Filtering Platform. There is not that much documentation on this except a few publications.
The second feature is Capture. As the name implies, it gives a capability to capture network traffic for further analysis. As for Metrics, it uses eBPF and specifically inspektor gadget trace plugin for Linux nodes, and Pktmon, a Windows Server utility for Windows nodes. Capure can be used either with the retina cli, or through the use of CRD. The output can be hosted in the host file system, or a storage blob.
Now about the architecture, as coulb be expected, Retina relies on pods that have to be on all observed nodes. Thus, following this logic, we get a daemonset to ensure that each nodes get its retina agent. Because it’s not the same technology for Linux and Windows, we have 2 differents daemonsets, one each of the OS.
yumemaru@azure:~$ kubectl get daemonsets.apps -n kube-system
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
===========================================TRUNCATED===========================================
retina-agent 3 3 3 3 3 kubernetes.io/os=linux 40h
retina-agent-win 0 0 0 0 0 kubernetes.io/os=windows 40h
Let’s build a lab to test this.
2. Preparing the lab
To test this monitoring solution, we’ll need the following
- An AKS cluster, that will be configured with Azure CNI with overlay, and Cilium dataplane
- A virtual Network in which the cluster will live
And in the cluster, we’ll deploy first a prometheus/grafan stack, then Retina.
For those interested, the lab config is available on github here.
There is nothing specific about the vnet or the AKS cluster. In this case, this is a cluster with Azure CNI powered by Cilium, in overlay mode. We’ll note that the instance type for thenode pool is D2s_v4
. We’ll come back to this inthe next session.
To install the prometheus stack we rely on the kube-prometheus-stack
from the helm repo https://prometheus-community.github.io/helm-charts
.
Retina doc provides a yaml file for the configuration specific to the metric to collect:
windowsMonitoring:
enabled: true
prometheusOperator:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/os
operator: In
values:
- linux
admissionWebhooks:
deployment:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/os
operator: In
values:
- linux
patch:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/os
operator: In
values:
- linux
prometheus:
prometheusSpec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/os
operator: In
values:
- linux
additionalScrapeConfigs: |
- job_name: "retina-pods"
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_container_name]
action: keep
regex: retina(.*)
- source_labels:
[__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
separator: ":"
regex: ([^:]+)(?::\d+)?
target_label: __address__
replacement: ${1}:${2}
action: replace
- source_labels: [__meta_kubernetes_pod_node_name]
action: replace
target_label: instance
metric_relabel_configs:
- source_labels: [__name__]
action: keep
regex: (.*)
After those initial steps, it’s time to install Retina. Taken from the documentation, we get the follogin helm cli command.
VERSION=$( curl -sL https://api.github.com/repos/microsoft/retina/releases/latest | jq -r .name)
helm upgrade --install retina oci://ghcr.io/microsoft/retina/charts/retina \
--version $VERSION \
--namespace kube-system \
--set image.tag=$VERSION \
--set operator.tag=$VERSION \
--set image.pullPolicy=Always \
--set logLevel=info \
--set os.windows=true \
--set operator.enabled=true \
--set operator.enableRetinaEndpoint=true \
--skip-crds \
--set enabledPlugin_linux="\[dropreason\,packetforward\,linuxutil\,dns\,packetparser\]" \
--set enablePodLevel=true \
--set enableAnnotations=true
The deployment should be completed easily. However, checking that everything is all right afterward, you may discover something like that:
yumemaru@azure:~$ kubectl get daemonsets.apps -n kube-system
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
===========================================TRUNCATED===========================================
retina-agent 3 3 2 3 3 kubernetes.io/os=linux 40h
retina-agent-win 0 0 0 0 0 kubernetes.io/os=windows 40h
We’ll note, first that there is no pod in the daemonset for Windows nodes, but that’s because we don’t have any windows node 😱. Second, we can see that one of the pod for the linux agent is not ready. Loonking in details we can see something like below:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 108s default-scheduler 0/3 nodes are available: 1 Insufficient cpu. preemption: 0/3 nodes are available: 3 No preemption victims found for incoming pod..
The message is clear, we don’t have enough resources remaining on the last node. If we look at the daemonset yaml config, we can find the following:
yumemaru@azure:~$ kubectl get ds -n kube-system retina-agent -o yaml
resources:
limits:
cpu: 500m
memory: 300Mi
requests:
cpu: 500m
memory: 300Mi
Remember, we have a node pool configured to use the D2s_v4
size, which has only 2 CPU available. Granting request and limit of 500m
to Retina is probably to much with those intances. Now, either we scale up the node pool instant size, or we can also choose to not deploy Retina on this default node pool, which is mainly for system workload. After all, do we care about the network traffic from the AKS-managed pod ?
Anyway, in this specific context, I do not wish to deploy additional node pools so I need to change some configuration to ensure that all my nodes get a Retina agent.
I’ll cheat a little (don’t do that in production obviously) and edit the daemonset to set the PriorityClassName
to system-node-critical
. This way we do ensure that the pod is scheduled on each nodes.
yumemaru@azure:~$ kubectl edit ds -n kube-system retina-agent
# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: apps/v1
kind: DaemonSet
metadata:
annotations:
deprecated.daemonset.template.generation: "5"
meta.helm.sh/release-name: retina
meta.helm.sh/release-namespace: kube-system
creationTimestamp: "2024-06-24T13:52:39Z"
generation: 5
labels:
app.kubernetes.io/component: workload
app.kubernetes.io/instance: retina
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: retina
app.kubernetes.io/part-of: retina
app.kubernetes.io/version: 0.0.1
helm.sh/chart: retina-v0.0.12
k8s-app: retina
name: retina-agent
namespace: kube-system
====================TRUNCATED====================
template:
metadata:
annotations:
checksum/config: a3c1c8676bf1ac68b21dbe42b5c3c10d0e177f1a322f419e27add9ee4c97eef9
prometheus.io/port: "10093"
prometheus.io/scrape: "true"
creationTimestamp: null
labels:
app.kubernetes.io/component: workload
app.kubernetes.io/instance: retina
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: retina
app.kubernetes.io/part-of: retina
app.kubernetes.io/version: 0.0.1
helm.sh/chart: retina-v0.0.12
k8s-app: retina
spec:
priorityClassName: system-node-critical
containers:
====================TRUNCATED====================
yumemaru@azure:~$ kubectl get ds -n kube-system retina-agent -o yaml | grep priorityClassName
priorityClassName: system-node-critical
Once this configuration updated, we now have a retina agent on each node and we can check that prometheus can see the proper targets, as explained on the Retina doc.
yumemaru@azure:~$ kubectl port-forward -n kube-system services/prometheus-operated 9090
You may get an error related to the previous pod that did not start before the priorityClassName configuration. As long as the daemonset shows all replicas ready, we can ignore that.
Next is the addition of a dashboard in grafana. Again we can follow the documentation to find the dashboard here
And connect to Grafana
yumemaru@azure:~$ kubectl get secret -n kube-system prometheus-stack-grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo
prom-operator
yumemaru@azure:~$ kubectl get secret -n kube-system prometheus-stack-grafana -o jsonpath="{.data.admin-user}" | base64 --decode ; echo
admin
yumemaru@azure:~$ kubectl port-forward -n kube-system services/prometheus-stack-grafana :80
To import the dashboard
One last thing, there is a Retina cli available. This cli is installable through the krew command. We will need it in the next section so let’s install it.
yumemaru@azure:~$ kubectl krew install retina
yumemaru@azure:~$ kubectl retina version
v0.0.12
Ok, we have everything we need to look at what we can do with Retina now.
3. What can we do with Retina
3.1. Grafana dashboard
Following the previous part, we now have access to a dashboard focused on Network monitoring.
Browsing this dashboard, we can identify the nodes available
A reference to the azure documentation
And metrics displayed in a visual way.
For example, we can see the remote IP addresses accessing the cluster
And specifically in this one the Azure DNS IP
There is also metrics for dropped packet, but currently, it does not seem to report dropped packet because of network policies. That’s something to dig I guess.
3.2. Retina capture
The other interesting feature is the network capture. This is the typical network capture that sysadmin/network people are used to, and exploit with tools such as wireshark.
Using capture in retina is either done through the retina cli, that we install previously, or through a CRD. The cli is quite well documentated in the documentation. In our case, we are Azure people (aren’t we? 🤭) so we’ll configure the capture to be recorded on a storage account. We need to specify a Shared Access Signature, on the target blob container.
yumemaru@azure:~$ az storage account keys list --account-name <staname>
[
{
"creationTime": "2023-12-04T09:22:28.356128+00:00",
"keyName": "key1",
"permissions": "FULL",
"value": "<access_key_value>"
},
{
"creationTime": "2023-12-04T09:22:28.356128+00:00",
"keyName": "key2",
"permissions": "FULL",
"value": "<access_key_value>"
}
]
yumemaru@azure:~$ az storage container generate-sas --account-key <access_key_value> --account-name <sta_name> --name <container_name> --permissions dlrw --expiry <expiry_date>
"<sas_value>"
yumemaru@azure:~$ export retinaendpoint="https://<staname>.blob.core.windows.net/<container_name>?se=<expiry_date>&<sas_value>"
once we got this, we can launch the capture through the cli. It will generate a kubernetes job and collect the data in the specified blob storage.
yumemaru@azure:~$ k retina capture create --name capture --blob-upload $retinaendpoint --namespace-selectors " " --pod-selectors "org=retina" --duration=2m
ts=2024-06-27T15:17:13.452+0200 level=info caller=capture/create.go:243 msg="The capture duration is set to 2m0s"
ts=2024-06-27T15:17:13.452+0200 level=info caller=capture/create.go:289 msg="The capture file max size is set to 100MB"
ts=2024-06-27T15:17:13.904+0200 level=info caller=utils/capture_image.go:56 msg="Using capture workload image ghcr.io/microsoft/retina/retina-agent:v0.0.12 with version determined by CLI version"
ts=2024-06-27T15:17:13.906+0200 level=info caller=capture/crd_to_job.go:224 msg="BlobUpload is not empty"
ts=2024-06-27T15:17:14.576+0200 level=info caller=capture/crd_to_job.go:876 msg="The Parsed tcpdump filter is \"\""
ts=2024-06-27T15:17:14.692+0200 level=info caller=capture/create.go:369 msg="Packet capture job is created" namespace=default capture job=capture-tzpcz
ts=2024-06-27T15:17:14.692+0200 level=info caller=capture/create.go:125 msg="Please manually delete all capture jobs"
ts=2024-06-27T15:17:14.692+0200 level=info caller=capture/create.go:127 msg="Please manually delete capture secret" namespace=default secret name=capture-blob-upload-secretmjj9j
NAMESPACE CAPTURE NAME JOBS COMPLETIONS AGE
default capture capture-tzpcz 0/1 0s
The main power of the capture tool in this case is the capability to scope per node, namespace, or even pod
Flags:
--blob-upload string Blob SAS URL with write permission to upload capture files
--debug When debug is true, a customized retina-agent image, determined by the environment variable RETINA_AGENT_IMAGE, is set
--duration duration Duration of capturing packets (default 1m0s)
--exclude-filter string A comma-separated list of IP:Port pairs that are excluded from capturing network packets. Supported formats are IP:Port, IP, Port, *:Port, IP:*
-h, --help help for create
--host-path string HostPath of the node to store the capture files
--include-filter string A comma-separated list of IP:Port pairs that are used to filter capture network packets. Supported formats are IP:Port, IP, Port, *:Port, IP:*
--include-metadata If true, collect static network metadata into capture file (default true)
--job-num-limit int The maximum number of jobs can be created for each capture. 0 means no limit
--max-size int Limit the capture file to MB in size which works only for Linux (default 100)
--namespace-selectors string A comma-separated list of namespace labels in which to apply the pod-selectors. By default, the pod namespace is specified by the flag namespace
--no-wait Do not wait for the long-running capture job to finish (default true)
--node-names string A comma-separated list of node names to select nodes on which the network capture will be performed
--node-selectors string A comma-separated list of node labels to select nodes on which the network capture will be performed
--packet-size int Limits the each packet to bytes in size which works only for Linux
--pod-selectors string A comma-separated list of pod labels to select pods on which the network capture will be performed
--pvc string PersistentVolumeClaim under the specified or default namespace to store capture files
--s3-access-key-id string S3 access key id to upload capture files
--s3-bucket string Bucket in which to store capture files
--s3-endpoint string Endpoint for an S3 compatible storage service. Use this if you are using a custom or private S3 service that requires a specific endpoint
--s3-path string Prefix path within the S3 bucket where captures will be stored (default "retina/captures")
--s3-region string Region where the S3 compatible bucket is located
--s3-secret-access-key string S3 access secret key to upload capture files
--tcpdump-filter string Raw tcpdump flags which works only for Linux
Global Flags:
--as string Username to impersonate for the operation. User could be a regular user or a service account in a namespace.
--as-group stringArray Group to impersonate for the operation, this flag can be repeated to specify multiple groups.
--as-uid string UID to impersonate for the operation.
--cache-dir string Default cache directory (default "/home/df/.kube/cache")
--certificate-authority string Path to a cert file for the certificate authority
--client-certificate string Path to a client certificate file for TLS
--client-key string Path to a client key file for TLS
--cluster string The name of the kubeconfig cluster to use
--context string The name of the kubeconfig context to use
--disable-compression If true, opt-out of response compression for all requests to the server
--insecure-skip-tls-verify If true, the server's certificate will not be checked for validity. This will make your HTTPS connections insecure
--kubeconfig string Path to the kubeconfig file to use for CLI requests.
--name string The name of the Retina Capture
-n, --namespace string If present, the namespace scope for this CLI request
--request-timeout string The length of time to wait before giving up on a single server request. Non-zero values should contain a corresponding time unit (e.g. 1s, 2m, 3h). A value of zero means don't timeout requests. (default "0")
-s, --server string The address and port of the Kubernetes API server
--tls-server-name string Server name to use for server certificate validation. If it is not provided, the hostname used to contact the server is used
--token string Bearer token for authentication to the API server
--user string The name of the kubeconfig user to use
After the capture, we get a tag.gz file which contains a .pcap file. This file is readable with whireshark. In this sample we can see some o fthe traffic that I generated during the capture
yumemaru@azure:~$ k get pod nginxclient-5c5b9b57b8-4kdml -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginxclient-5c5b9b57b8-4kdml 1/1 Running 0 7h29m 100.72.1.244 aks-aksnp0retina-29865950-vmss00000a <none> <none>
yumemaru@azure:~$ k get pod -n demo -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
demodeploy-f67b46b7b-8zmtb 1/1 Running 0 7h30m 100.72.0.241 aks-aksnp0retina-29865950-vmss000009 <none> <none>
yumemaru@azure:~$ k exec deployments/nginxclient -- curl -i -X GET http://demodeploy.demo.svc.cluster.local
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 615 100 615 0 0 58992 0 --:--:-- --:--:-- --:--:-- 61500
HTTP/1.1 200 OK
Server: nginx/1.27.0
Date: Thu, 27 Jun 2024 13:17:53 GMT
Content-Type: text/html
Content-Length: 615
Last-Modified: Tue, 28 May 2024 13:22:30 GMT
Connection: keep-alive
ETag: "6655da96-267"
Accept-Ranges: bytes
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
html { color-scheme: light dark; }
body { width: 35em; margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif; }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>
<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>
<p><em>Thank you for using nginx.</em></p>
</body>
</html>
If we used the capture on the node selector level, we could see traffic related to Azure DNS or to the Instance MetadataService
We’ll note however that we can get this kind of information from the Network Watcher capture also. The specificity of Retina capture is more related to its availability in When comparing both capture tool, the main advantages of Retina over NEtwork Watcher are the kubernetes level filtering on one part, and the scope of execution, which does not require access on the Network level, which is probzbly not the case for a platform engineering team responsible for kubernetes clusters.
Beforce concluding this article, we should have a look at the Retina capture through CRD.
That can be done only if the installation included support for the capture. The CRD specification is available on the retina documentation, as expected:
- API Group: retina.sh
- API Version: v1alpha1
- Kind: Capture
- Plural: captures
- Singular: capture
- Scope: Namespaced
To create a capture with a CRD, we use the following manifest.
apiVersion: retina.sh/v1alpha1
kind: Capture
metadata:
name: samplecrdcapture
spec:
captureConfiguration:
captureOption:
duration: "120s"
captureTarget:
namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: demo
podSelector:
matchLabels:
org: retina
outputConfiguration:
blobUpload: blob-sas-url
With a corresponding secret for the blob url.
apiVersion: v1
data:
blob-sas-url: <base64encodedsecret>
kind: Secret
metadata:
name: blob-sas-url
We can see the pod corresponding to the job:
yumemaru@azure:~$ k get pod
NAME READY STATUS RESTARTS AGE
nginxclient-5c5b9b57b8-94s7r 1/1 Running 0 2d5h
nginxclient-5c5b9b57b8-9jkxx 1/1 Running 0 2d5h
nginxclient-5c5b9b57b8-sg89l 1/1 Running 0 2d5h
samplecrdcapture-lsjdz-gzqk9 0/1 Completed 0 2m21s
yumemaru@azure:~$ k get jobs.batch
NAME COMPLETIONS DURATION AGE
samplecrdcapture-lsjdz 1/1 2m4s 5m18s
Ok, time to wrap up!
4. Summary
So we have this nice network monitoring tool available for free, ad that leverage eBPF. Coupled with an installation of prometheus, we can get a nice visiblity of the network aht is otherwise not easily available. We can also create network capture for post-analysis. This capture, avaible through cli or CRD, is comparable to a Network Watcher capture but with an access scoped on the kubernetes plane level, which definitely make sense for kubernetes native teams. Some additional samples are available on the retina github. And last but not least, Retina is included in the Microsoft managed offer Advanced Network Observability, which itself a part of the Advanced container networking services suite. We’ll stop for now but there is probably some digging to be done on all of those stuff ^^