Cilium is an open source, cloud native solution for providing, securing, and observing network connectivity between workloads, fueled by the revolutionary Kernel technology eBPF
eBPF is a revolutionary technology with origins in the Linux kernel that can run sandboxed programs in a privileged context such as the operating system kernel. It is used to safely and efficiently extend the capabilities of the kernel without requiring to change kernel source code or load kernel modules.
As it is always interesting to learn new technology I thought writing a post about Cilium was about time. At first look Cilium is kind of a Swiss Army knife with a lot interesting features. I will go through this post beginning with a basic installation of Cilium on a new cluster (upstream K8s based on Ubuntu nodes). Then I will continune with some of the features I found interesting, and needed myself in my lab, and how to enable and configure them.
This post will be divided into dedicated sections for the installtion part and the different features respectively, starting with the installation of Cilium as the CNI in my Kubernetes cluster.
Cililum can be installed using Helm or using Ciliums nifty cilium-cli tool.
One can use Helm to configure/install features but also the Cilium cli tool. In my post I will mostly use Helm when adding some features or changing certain settings and cilium-cli for others just to showcase how easy it is to use cilium cli for certain features/tasks.
According to the official docs:
Install the latest version of the Cilium CLI. The Cilium CLI can be used to install Cilium, inspect the state of a Cilium installation, and enable/disable various features (e.g. clustermesh, Hubble).
The first feature of Cilium in this post is how it can fully replace kube-proxy by providing distributed load balancing using eBPF. Naturally I would like to use this feature. This means I need to deploy my Kubernetes cluster without kube-proxy. That is easiest done during the initial upbringing of the Kubernetes cluster. It can be done post-upringing also, see more info here
This is the parameter to disable kube-proxy –skip-phases=addon/kube-proxy
I1219 14:08:17.376790 13327 version.go:256] remote version is much newer: v1.29.0; falling back to: stable-1.28
[init] Using Kubernetes version: v1.28.4
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'W1219 14:08:33.520592 13327 checks.go:835] detected that the sandbox image "registry.k8s.io/pause:3.5" of the container runtime is inconsistent with that used by kubeadm. It is recommended that using "registry.k8s.io/pause:3.9" as the CRI sandbox image.
[certs] Using certificateDir folder "/etc/kubernetes/pki"[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [k8s-master-01 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local test-cluster-1.my-domain.net] and IPs [10.23.0.1 10.160.1.10][certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [k8s-master-01 localhost] and IPs [10.160.1.10 127.0.0.1 ::1][certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [k8s-master-01 localhost] and IPs [10.160.1.10 127.0.0.1 ::1][certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[etcd] Creating static Pod manifest forlocal etcd in "/etc/kubernetes/manifests"[control-plane] Using manifest folder "/etc/kubernetes/manifests"[control-plane] Creating static Pod manifest for"kube-apiserver"[control-plane] Creating static Pod manifest for"kube-controller-manager"[control-plane] Creating static Pod manifest for"kube-scheduler"[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"[kubelet-start] Starting the kubelet
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.
[apiclient] All control plane components are healthy after 106.047476 seconds
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Storing the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
[upload-certs] Using certificate key:
3c9fa959a7538baaaf484e931ade45fbad07934dc40d456cae54839a7d888715
[mark-control-plane] Marking the node k8s-master-01 as control-plane by adding the labels: [node-role.kubernetes.io/control-plane node.kubernetes.io/exclude-from-external-load-balancers][mark-control-plane] Marking the node k8s-master-01 as control-plane by adding the taints [node-role.kubernetes.io/control-plane:NoSchedule][bootstrap-token] Using token: q495cj.apdasczda14j87tc
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] Configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] Configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] Configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] Configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
[addons] Applied essential addon: CoreDNS
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g)$HOME/.kube/config
Alternatively, if you are the root user, you can run:
exportKUBECONFIG=/etc/kubernetes/admin.conf
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
You can now join any number of the control-plane node running the following command on each as root:
kubeadm join test-cluster-1.my-domain.net:6443 --token q4da14j87tc \
--discovery-token-ca-cert-hash sha256: \
--control-plane --certificate-key
Please note that the certificate-key gives access to cluster sensitive data, keep it secret!
As a safeguard, uploaded-certs will be deleted in two hours; If necessary, you can use
"kubeadm init phase upload-certs --upload-certs" to reload certs afterward.
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join test-cluster-1.my-domain.net:6443 --token q4aczda14j87tc \
--discovery-token-ca-cert-hash sha256:
NB, notice that it “complains” No kubeproxy.config.k8s.io/v1alpha1 config is loaded. Continuing without it: configmaps “kube-proxy”
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'W1219 16:25:57.203844 1279 configset.go:78] Warning: No kubeproxy.config.k8s.io/v1alpha1 config is loaded. Continuing without it: configmaps "kube-proxy" is forbidden: User "system:bootstrap:q495cj" cannot get resource "configmaps" in API group "" in the namespace "kube-system"[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.
Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
NB, notice that it “complains” No kubeproxy.config.k8s.io/v1alpha1 config is loaded. Continuing without it: configmaps “kube-proxy”
When all worker nodes has been joined:
andreasm@linuxmgmt01:~/test-cluster-1$ k get nodes
NAME STATUS ROLES AGE VERSION
k8s-master-01 Ready control-plane 135m v1.28.2
k8s-worker-01 Ready <none> 12s v1.28.2
k8s-worker-02 Ready <none> 38s v1.28.2
k8s-worker-03 Ready <none> 4m28s v1.28.2
Notice they are not ready. CoreDNS is pending and there is no CNI in place to cover IPAM etc..
andreasm@linuxmgmt01:~/test-cluster-1$ k get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-5dd5756b68-c5xml 0/1 Pending 0 35m
kube-system coredns-5dd5756b68-fgdzj 0/1 Pending 0 35m
kube-system etcd-k8s-master-01 1/1 Running 0 35m
kube-system kube-apiserver-k8s-master-01 1/1 Running 0 35m
kube-system kube-controller-manager-k8s-master-01 1/1 Running 1(19m ago) 35m
kube-system kube-scheduler-k8s-master-01 1/1 Running 1(19m ago) 35m
Now its time to jump over to my jumphost where I will do all the remaining configurations/interactions with my test-cluster-1.
From my jumphost I already have all the tools I need to deploy Cilium. To install the Cilium CNI I will just use the cilium-cli tool as it is so easy. With a very short command it will automatically install Cilium on all my worker/control-plane nodes. The cilium-cli will act according to the kube context you are in, so make sure you are in the correct context (the context that needs Cilium to be installed):
andreasm@linuxmgmt01:~/test-cluster-1$ k config current-context
test-cluster-1-admin@kubernetes
andreasm@linuxmgmt01:~/test-cluster-1$ cilium install --version 1.14.5
ℹ️ Using Cilium version 1.14.5
🔮 Auto-detected cluster name: test-cluster-1
🔮 Auto-detected kube-proxy has not been installed
ℹ️ Cilium will fully replace all functionalities of kube-proxy
Thats it…. 😄
Version 1.14.5 is the latest stable at the writing of this post.
Where the API_SERVER_PORT is one of my k8s control plane node (I did try to use the loadbalanced IP for the k8s api endpoint as I have 3 control plane nodes but that did not work out so I went with the IP of my first cp node). The value file is the value file I am using to set all the Cilium settings, more on that later.
Did you notice above that the Cilium installer discovered there was no kube-proxy and that it told me it will replace all the feature of kube-proxy? Well it did.. Lets check the config of Cilium and see if that is also reflected there. Look after this key-value:
When I installed Cilium using the cilium-cli tool, it actually deploys using Helm in the background. Lets see if there is a Helm manifest in the kube-system:
andreasm@linuxmgmt01:~/test-cluster-1$ helm list -n kube-system
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
cilium kube-system 1 2023-12-19 13:50:40.121866679 +0000 UTC deployed cilium-1.14.5 1.14.5
Well there it is..
That makes it all more interesting. As I will use Helm to update certain parameters going forward in this post I will take a “snapshot” of the current values in the manifest above and altered in the next sections when I enabel additional features. How does the values look like now?
From the configMap cilium-configapiVersion:v1data:agent-not-ready-taint-key:node.cilium.io/agent-not-readyarping-refresh-period:30sauto-direct-node-routes:"false"bpf-lb-external-clusterip:"false"bpf-lb-map-max:"65536"bpf-lb-sock:"false"bpf-map-dynamic-size-ratio:"0.0025"bpf-policy-map-max:"16384"bpf-root:/sys/fs/bpfcgroup-root:/run/cilium/cgroupv2cilium-endpoint-gc-interval:5m0scluster-id:"0"cluster-name:test-cluster-1cluster-pool-ipv4-cidr:10.0.0.0/8cluster-pool-ipv4-mask-size:"24"cni-exclusive:"true"cni-log-file:/var/run/cilium/cilium-cni.logcnp-node-status-gc-interval:0scustom-cni-conf:"false"debug:"false"debug-verbose:""disable-cnp-status-updates:"true"egress-gateway-reconciliation-trigger-interval:1senable-auto-protect-node-port-range:"true"enable-bgp-control-plane:"false"enable-bpf-clock-probe:"false"enable-endpoint-health-checking:"true"enable-health-check-nodeport:"true"enable-health-checking:"true"enable-hubble:"true"enable-ipv4:"true"enable-ipv4-big-tcp:"false"enable-ipv4-masquerade:"true"enable-ipv6:"false"enable-ipv6-big-tcp:"false"enable-ipv6-masquerade:"true"enable-k8s-networkpolicy:"true"enable-k8s-terminating-endpoint:"true"enable-l2-neigh-discovery:"true"enable-l7-proxy:"true"enable-local-redirect-policy:"false"enable-policy:defaultenable-remote-node-identity:"true"enable-sctp:"false"enable-svc-source-range-check:"true"enable-vtep:"false"enable-well-known-identities:"false"enable-xt-socket-fallback:"true"external-envoy-proxy:"false"hubble-disable-tls:"false"hubble-listen-address::4244hubble-socket-path:/var/run/cilium/hubble.sockhubble-tls-cert-file:/var/lib/cilium/tls/hubble/server.crthubble-tls-client-ca-files:/var/lib/cilium/tls/hubble/client-ca.crthubble-tls-key-file:/var/lib/cilium/tls/hubble/server.keyidentity-allocation-mode:crdidentity-gc-interval:15m0sidentity-heartbeat-timeout:30m0sinstall-no-conntrack-iptables-rules:"false"ipam:cluster-poolipam-cilium-node-update-rate:15sk8s-client-burst:"10"k8s-client-qps:"5"kube-proxy-replacement:strictkube-proxy-replacement-healthz-bind-address:""mesh-auth-enabled:"true"mesh-auth-gc-interval:5m0smesh-auth-queue-size:"1024"mesh-auth-rotated-identities-queue-size:"1024"monitor-aggregation:mediummonitor-aggregation-flags:allmonitor-aggregation-interval:5snode-port-bind-protection:"true"nodes-gc-interval:5m0soperator-api-serve-addr:127.0.0.1:9234preallocate-bpf-maps:"false"procfs:/host/procproxy-connect-timeout:"2"proxy-max-connection-duration-seconds:"0"proxy-max-requests-per-connection:"0"proxy-prometheus-port:"9964"remove-cilium-node-taints:"true"routing-mode:tunnelset-cilium-is-up-condition:"true"set-cilium-node-taints:"true"sidecar-istio-proxy-image:cilium/istio_proxyskip-cnp-status-startup-clean:"false"synchronize-k8s-nodes:"true"tofqdns-dns-reject-response-code:refusedtofqdns-enable-dns-compression:"true"tofqdns-endpoint-max-ip-per-hostname:"50"tofqdns-idle-connection-grace-period:0stofqdns-max-deferred-connection-deletes:"10000"tofqdns-proxy-response-max-delay:100mstunnel-protocol:vxlanunmanaged-pod-watcher-interval:"15"vtep-cidr:""vtep-endpoint:""vtep-mac:""vtep-mask:""write-cni-conf-when-ready:/host/etc/cni/net.d/05-cilium.conflistkind:ConfigMapmetadata:annotations:meta.helm.sh/release-name:ciliummeta.helm.sh/release-namespace:kube-systemcreationTimestamp:"2023-12-19T13:50:42Z"labels:app.kubernetes.io/managed-by:Helmname:cilium-confignamespace:kube-systemresourceVersion:"4589"uid:f501a3d0-8b33-43af-9fae-63625dcd6df1
These are the two settings that have been changed from being completely default:
I prefer editing the changes in a dedicated value.yaml file and run helm upgrade -f value.yaml each time I want to do a change so going forward I will be the adding/changing certain settings in this value.yaml file to update the settings in Cilium.
I grabbed the default value yaml from the Helm repo and use that to alter the settings in the next sections.
The cilium-cli can also be used to enable disable features certain features like Hubble and clustermesh. An example on how to install Hubble with cilium-cli is shown below in the next chapter, but I can also use Helm to achieve the same. I enable Hubble using cilium-cli just to show how easy it is.
But as I mention above, I prefer using the Helm method as I can keep better track of the settings and have them consistent each time I alter an update and refering to my value.yaml file.
Observability and flow-monitoring - Hubble Observability
#
Cilium comes with a very neat monitor tool out of the box called Hubble. It is enabled by default but I need to enable the Hubble Relay and Hubble UI feature to get the information from my nodes, pods etc available in a nice dashboard (Hubble UI), so this is a feature I certainly want to enable as one of the first features to test out.
Again, uisng Cilium-CLI its a very quick and simple operation:
andreasm@linuxmgmt01:~$ cilium hubble enable --ui
Lets check the services in my cluster:
andreasm@linuxmgmt01:~$ k get svc -n kube-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
hubble-peer ClusterIP 10.23.182.223 <none> 443/TCP 6h19m
hubble-relay ClusterIP 10.23.182.76 <none> 80/TCP 4h34m
hubble-ui ClusterIP 10.23.31.4 <none> 80/TCP 42s
kube-dns ClusterIP 10.23.0.10 <none> 53/UDP,53/TCP,9153/TCP 6h59m
Hubble Relay and Hubble UI service is enabled. The issue though is that they are exposed using clusterIP, I need to reach them from the outside of my cluster. Lets continue with the next feature to test: LB-IPAM.
Using Helm to enable Hubble Relay and Hubble-UI
Instead of using clilium cli I would have enabled the Relay and UI in my value.yaml file and run the following command:
andreasm@linuxmgmt01:~/test-cluster-1$ helm upgrade -n kube-system cilium cilium/cilium --version 1.14.5 -f cilium-values-feature-by-feature.yaml
Release "cilium" has been upgraded. Happy Helming!
NAME: cilium
LAST DEPLOYED: Tue Dec 19 20:32:12 2023NAMESPACE: kube-system
STATUS: deployed
REVISION: 13TEST SUITE: None
NOTES:
You have successfully installed Cilium with Hubble Relay and Hubble UI.
Your release version is 1.14.5.
For any further help, visit https://docs.cilium.io/en/v1.14/gettinghelp
Where I have changed these settings in the value.yaml:
relay:# -- Enable Hubble Relay (requires hubble.enabled=true)enabled:true......ui:# -- Whether to enable the Hubble UI.enabled:true........hubble:# -- Enable Hubble (true by default).enabled:true............# -- Buffer size of the channel Hubble uses to receive monitor events. If this# value is not set, the queue size is set to the default monitor queue size.# eventQueueSize: ""# -- Number of recent flows for Hubble to cache. Defaults to 4095.# Possible values are:# 1, 3, 7, 15, 31, 63, 127, 255, 511, 1023,# 2047, 4095, 8191, 16383, 32767, 65535# eventBufferCapacity: "4095"# -- Hubble metrics configuration.# See https://docs.cilium.io/en/stable/observability/metrics/#hubble-metrics# for more comprehensive documentation about Hubble metrics.metrics:# -- Configures the list of metrics to collect. If empty or null, metrics# are disabled.# Example:## enabled:# - dns:query;ignoreAAAA# - drop# - tcp# - flow# - icmp# - http## You can specify the list of metrics from the helm CLI:## --set metrics.enabled="{dns:query;ignoreAAAA,drop,tcp,flow,icmp,http}"#enabled:- dns:query;ignoreAAAA ### added these- drop ### added these- tcp ### added these- flow ### added these- icmp ### added these- http ### added these.........
Exposing a service from Kubernetes to be accessible from outside the cluster can be done in a couple of ways:
Exporting the service by binding it to a node using NodePort (not scalable and manageable).
Exporting the service using a servicetype of loadBalancer, only Layer4, though scalable. Usually requires external load balancer installed or some additional component installed and configured to support your Kubernetes platform.
Exporting using Ingress, Layer7, requires a loadbalancer to provide exernal IP address
Exporting using GatewayAPI (Ingress successor), requires a loadbalancer to provide exernal IP address.
Cilium has really made it simple here, it comes with a built in LoadBalancer-IPAM. More info here.
This is already enabled, no feature to install or enable. The only thing I need to do is to configure an IP pool that will provide ip addresses from a defined subnet when I request a serviceType loadBalancer, Ingress or Gateway. We can configure multiple pools with different subnets, and configure a serviceSelector matching on labels or expressions.
In my lab I have already configured a couple of IP pools, using different subnets and different serviceSelectors so I can control which service gets IP addresses from which pool.
A couple of example pools from my lab:
apiVersion:"cilium.io/v2alpha1"kind:CiliumLoadBalancerIPPoolmetadata:name:"gateway-api-pool-10.150.14.x"spec:cidrs:- cidr:"10.150.14.0/24"serviceSelector:matchExpressions:- {key: io.kubernetes.service.namespace, operator: In, values:[harbor, booking]}---apiVersion:"cilium.io/v2alpha1"kind:CiliumLoadBalancerIPPoolmetadata:name:"lb-pool-prod.10.150.11.x"spec:cidrs:- cidr:"10.150.11.0/24"serviceSelector:matchExpressions:- {key: env, operator: In, values:[prod]}
The first pool will only provide IP addresses to services being deployed in any of the two namespaces “harbor” or “booking”. This is an “OR” selection, not AND, meaning it can be deployed in any of the namespaces, not both. The second will use lablels and match on the key-value: env=prod.
Bear in mind that these IP Pools will only listen for services (serviceType loadBalancer) not Ingress pr say. That means each time you create an Ingress or a Gateway the serviceType loadBalancer will be auto-created as a reaction to the Ingress/Gateway creation. So if you try to create labels on the Ingress/Gatewat object it will not be noticed by the LB-IPAM pool. Instead you can adjust the selection based on the namespace you know it will be created in, or use this label that is auto-created on the svc: “Labels: io.cilium.gateway/owning-gateway=“name-of-gateway””
As soon as you have created an ip-pool, applied it, it will immediately start to serve requests by providing IP addresses to them. This is very nice.
There is a small catch though. If I create IP Pools, as above, which is outside of my nodes subnet how does my network know how to reach these subnets? Creating static routes and pointing to my nodes that potentially holds these ip addresses? Nah.. Not scalable, nor manageable. Some kind of dynamic routing protocol would be best here, BGP or OSPF.
Did I mention that Cilium also includes support for BGP out of the box?
Yes, you guessed it, Cilium includes BGP. A brilliant way of advertising all my IP pools. Creating many IP pools with a bunch of subnets have never been more fun. This is the same concept as I write about here, the biggest difference is that with Cilium this only needs to be enabled as a feature and then define a yaml to confgure the bgp settings. Nothing additional to install, just Plug’nPlay.
For more info on the BGP control plane, read here.
First out, enable the BGP control plane feature. To enable it I will alter my Helm value.yaml file with this setting:
# -- This feature set enables virtual BGP routers to be created via# CiliumBGPPeeringPolicy CRDs.bgpControlPlane:# -- Enables the BGP control plane.enabled:true
Now I need to create a yaml that contains the BGP peering info I need for my workers to peer to my upstream router. For reference I will paste my lab topology here again:
When I apply my below BGPPeeringPolicy yaml, my nodes will enable a BGP peering session to the switch (their upstream bgp neighbor) they are connected to in the diagram above. This switch has also been configured to allow them as BGP neigbors. Please take into consideration creating some ip-prefix/route-maps so we dont accidentally advertise routes that confilcts, or should not be advertised into the network to prevent BGP blackholes etc…
Here we can also configure a serviceSelector to prevent services we dont want to be advertised. I used used the example from the official docs to allow everything. If I also have a good BGP route-map config on my switch side or upstream bgp neighbour subnets that are not allowed will never be advertised.
Now that I have applied it I can check the bgp peering status using the Cilium cli:
andreasm@linuxmgmt01:~/prod-cluster-1/cilium$ cilium bgp peers
Node Local AS Peer AS Peer Address Session State Uptime Family Received Advertised
k8s-prod-node-01 6452064512 10.160.1.1 established 13h2m53s ipv4/unicast 476 ipv6/unicast 00k8s-prod-node-02 6452064512 10.160.1.1 established 13h2m25s ipv4/unicast 456 ipv6/unicast 00k8s-prod-node-03 6452064512 10.160.1.1 established 13h2m27s ipv4/unicast 436 ipv6/unicast 00
I can see some prefixes being Advertised and some being Received and the Session State is Established. I can also confirm that on my switch, and the routes they advertise:
GUZ-SW-01# show ip bgp summary
Peer Information
Remote Address Remote-AS Local-AS State Admin Status
--------------- --------- -------- ------------- ------------
10.160.1.114 6452064512 Established Start
10.160.1.115 6452064512 Established Start
10.160.1.116 6452064512 Established Start
172.18.1.1 6450064512 Established Start
GUZ-SW-01# show ip bgp
Local AS : 64512 Local Router-id : 172.18.1.2
BGP Table Version : 1706 Status codes: * - valid, > - best, i - internal, e - external, s - stale
Origin codes: i - IGP, e - EGP, ? - incomplete
Network Nexthop Metric LocalPref Weight AsPath
------------------ --------------- ---------- ---------- ------ ---------
* e 10.150.11.10/32 10.160.1.114 0064520 i
*>e 10.150.11.10/32 10.160.1.115 0064520 i
* e 10.150.11.10/32 10.160.1.116 0064520 i
* e 10.150.11.199/32 10.160.1.114 0064520 i
* e 10.150.11.199/32 10.160.1.115 0064520 i
*>e 10.150.11.199/32 10.160.1.116 0064520 i
* e 10.150.12.4/32 10.160.1.114 0064520 i
* e 10.150.12.4/32 10.160.1.115 0064520 i
*>e 10.150.12.4/32 10.160.1.116 0064520 i
* e 10.150.14.32/32 10.160.1.114 0064520 i
* e 10.150.14.32/32 10.160.1.115 0064520 i
*>e 10.150.14.32/32 10.160.1.116 0064520 i
* e 10.150.14.150/32 10.160.1.114 0064520 i
*>e 10.150.14.150/32 10.160.1.115 0064520 i
* e 10.150.14.150/32 10.160.1.116 0064520 i
* e 10.150.15.100/32 10.160.1.114 0064520 i
* e 10.150.15.100/32 10.160.1.115 0064520 i
*>e 10.150.15.100/32 10.160.1.116 0064520 i
Now I can just create my IP Pools, create some services and they should be immediately advertised and reachable in my network (unless they are being stopped by some route-maps ofcourse).
Note, it will only advertise ip-addresses in use by a service, not the whole subnet I define in my IP-Pools. That means I will only see host-routes advertised (as seen above).
It says LoadBalancer IPAM, but does it actually loadbalance? Let me quicly put that to a test.
I have exposed a web service using serviceType loadBalancer consisting of three simple nginx web pods.
Here is the yaml I am using (think I grabbed it from the offical Cilium docs)
apiVersion:v1kind:Servicemetadata:name:test-lbnamespace:examplelabels:env:prod#### added this label to match with my ip poolspec:type:LoadBalancerports:- port:80targetPort:80protocol:TCPname:httpselector:svc:test-lb---apiVersion:apps/v1kind:Deploymentmetadata:name:nginxnamespace:examplespec:selector:matchLabels:svc:test-lbtemplate:metadata:labels:svc:test-lbspec:containers:- name:webimage:nginximagePullPolicy:IfNotPresentports:- containerPort:80readinessProbe:httpGet:path:/port:80
Initially it deploys one pod, I will scale it up to three
They are running here, perfectly distributed across all my worker nodes:
andreasm@linuxmgmt01:~/prod-cluster-1/cilium$ k get pods -n example -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-698447f456-5xczj 1/1 Running 0 18s 10.0.0.239 k8s-prod-node-01 <none> <none>
nginx-698447f456-plknk 1/1 Running 0 117s 10.0.4.167 k8s-prod-node-02 <none> <none>
nginx-698447f456-xs4jq 1/1 Running 0 18s 10.0.5.226 k8s-prod-node-03 <none> <none>
And here is the LB service:
example test-lb LoadBalancer 10.21.69.190 10.150.11.48 80:31745/TCP
Now let me do a curl against the LoadBalancer IP and see if something changes:
Every 0.5s: curl http://10.150.11.48 linuxmgmt01: Wed Dec 20 07:58:14 2023 % Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
00000000 --:--:-- --:--:-- --:--:-- 010056710056700 184k
0 --:--:-- --:--:-- --:--:-- 276k
<!DOCTYPE html>
<html>
<head>
Pod 2##### Notice this <style>
html { color-scheme: light dark;}body { width: 35em; margin: 0 auto;font-family: Tahoma, Verdana, Arial, sans-serif;}</style>
</head>
<body>
Pod 2<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>
<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>
<p><em>Thank you for using nginx.</em></p>
</body>
</html>
Every 0.5s: curl http://10.150.11.48 linuxmgmt01: Wed Dec 20 07:59:15 2023 % Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
00000000 --:--:-- --:--:-- --:--:-- 010056710056700 110k
0 --:--:-- --:--:-- --:--:-- 138k
<!DOCTYPE html>
<html>
<head>
Pod 1##### Notice this<style>
html { color-scheme: light dark;}body { width: 35em; margin: 0 auto;font-family: Tahoma, Verdana, Arial, sans-serif;}</style>
</head>
<body>
Pod 1<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>
<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>
<p><em>Thank you for using nginx.</em></p>
</body>
</html>
Every 0.5s: curl http://10.150.11.48 linuxmgmt01: Wed Dec 20 08:01:02 2023 % Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
00000000 --:--:-- --:--:-- --:--:-- 010056710056700 553k
0 --:--:-- --:--:-- --:--:-- 553k
<!DOCTYPE html>
<html>
<head>
Pod 3##### Notice this<style>
html { color-scheme: light dark;}body { width: 35em; margin: 0 auto;font-family: Tahoma, Verdana, Arial, sans-serif;}</style>
</head>
<body>
Pod 3<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>
<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>
<p><em>Thank you for using nginx.</em></p>
</body>
</html>
Well, it is actually load-balancing the requests to the three different pods, running on three different nodes.
And it took me about 5 seconds to apply the ip-pool yaml and the bgppeeringpolicy yaml and I had a fully functioning load-balancer.
A bit more info on this feature from the offical Cilium docs:
LB IPAM works in conjunction with features like the Cilium BGP Control Plane (Beta). Where LB IPAM is responsible for allocation and assigning of IPs to Service objects and other features are responsible for load balancing and/or advertisement of these IPs.
So I assume the actual loadbalancing is done by BGP here.
As I covered above serviceType loadBalancer, let me quickly cover how to enable Cilium IngressController.
More info can be found here
I will head into my Helm value.yaml and edit the following:
ingressController:# -- Enable cilium ingress controller# This will automatically set enable-envoy-config as well.enabled:true# -- Set cilium ingress controller to be the default ingress controller# This will let cilium ingress controller route entries without ingress class setdefault:false# -- Default ingress load balancer mode# Supported values: shared, dedicated# For granular control, use the following annotations on the ingress resource# ingress.cilium.io/loadbalancer-mode: shared|dedicated,loadbalancerMode:dedicated
The Cilium Ingress controller can be dedicated or shared, meaning that it can support a shared IP for multiple Ingress objects. Nice if we are IP limited etc. Additionally we can edit the shared Ingress to configured with a specific IP like this:
# -- Load-balancer service in shared mode.# This is a single load-balancer service for all Ingress resources.service:# -- Service namename:cilium-ingress# -- Labels to be added for the shared LB servicelabels:{}# -- Annotations to be added for the shared LB serviceannotations:{}# -- Service type for the shared LB servicetype:LoadBalancer# -- Configure a specific nodePort for insecure HTTP traffic on the shared LB serviceinsecureNodePort:~# -- Configure a specific nodePort for secure HTTPS traffic on the shared LB servicesecureNodePort :~# -- Configure a specific loadBalancerClass on the shared LB service (requires Kubernetes 1.24+)loadBalancerClass:~# -- Configure a specific loadBalancerIP on the shared LB serviceloadBalancerIP :10.150.11.100### Set your preferred IP here# -- Configure if node port allocation is required for LB service# ref: https://kubernetes.io/docs/concepts/services-networking/service/#load-balancer-nodeport-allocationallocateLoadBalancerNodePorts:~
This will dictate that the shared Ingress object will get this IP address.
Now save changes and run the helm upgrade command:
andreasm@linuxmgmt01:~/test-cluster-1$ helm upgrade -n kube-system cilium cilium/cilium --version 1.14.5 -f cilium-values-feature-by-feature.yaml
Release "cilium" has been upgraded. Happy Helming!
NAME: cilium
LAST DEPLOYED: Wed Dec 20 08:18:58 2023NAMESPACE: kube-system
STATUS: deployed
REVISION: 15TEST SUITE: None
NOTES:
You have successfully installed Cilium with Hubble Relay and Hubble UI.
Your release version is 1.14.5.
For any further help, visit https://docs.cilium.io/en/v1.14/gettinghelp
Now is also a good time to restart the Cilium Operator and Cilium Agents to re-read the new configMap.
As soon as I enable the Ingress controller it will create this object for me, and provide an IngressClass in my cluster.
andreasm@linuxmgmt01:~/test-cluster-1$ k get ingressclasses.networking.k8s.io
NAME CONTROLLER PARAMETERS AGE
cilium cilium.io/ingress-controller <none> 86s
Now I suddenly have an IngressController also. Let me deploy a test app to test this.
First I deploy two pods with their corresponding clusterIP services:
kind:PodapiVersion:v1metadata:name:apple-applabels:app:applenamespace:fruitspec:containers:- name:apple-appimage:hashicorp/http-echoargs:- "-text=apple"---kind:ServiceapiVersion:v1metadata:name:apple-servicenamespace:fruitspec:selector:app:appleports:- port:5678# Default port for image
kind:PodapiVersion:v1metadata:name:banana-applabels:app:banananamespace:fruitspec:containers:- name:banana-appimage:hashicorp/http-echoargs:- "-text=banana"---kind:ServiceapiVersion:v1metadata:name:banana-servicenamespace:fruitspec:selector:app:bananaports:- port:5678# Default port for image
And then the Ingress pointing to the two services apple and banana:
Notice the only annotation I have used is the loadbalancer-mode: dedicated. The other value that is accepted is shared. By using this annotation I can choose on specific Ingresses whether they should be using the ip from the shared Ingress object or if I want it to be a dedicated one with its own IP address. If I dont want this Ingress to consume a specific IP address I will use shared, if I want to create a dedicated IP for this Ingress I can use dedicated. The shared service for Ingress object is automatically created when enabling the IngressController. You can see this here:
andreasm@linuxmgmt01:~$ k get svc -n kube-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
cilium-shared-ingress LoadBalancer 10.21.104.15 10.150.15.100 80:30810/TCP,443:31104/TCP 46h
I have configured this shared-ingress to use a specific ip-address.
When using dedicated it will create a cilium-ingress-name-of-Ingress on a new IP address (as can be seen below).
As soon as this has been applied Cilium will automatically take care of the serviceType loadBalancer object by getting an IP address from one of the IP pools that matches my serviceSelections (depending on shared or dedicated ofcourse). Then BGP will automatically advertise the host-route to my BGP router. And the Ingress object should now be listening on HTTP requests on this IP.
Here is the services/objects created:
andreasm@linuxmgmt01:~/prod-cluster-1/cilium$ k get ingress -n fruit
NAME CLASS HOSTS ADDRESS PORTS AGE
ingress-example cilium fruit.my-domain.net 10.150.12.4 80 44h
andreasm@linuxmgmt01:~/prod-cluster-1/cilium$ k get svc -n fruit
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
apple-service ClusterIP 10.21.243.103 <none> 5678/TCP 4d9h
banana-service ClusterIP 10.21.124.111 <none> 5678/TCP 4d9h
cilium-ingress-ingress-example LoadBalancer 10.21.50.107 10.150.12.4 80:30792/TCP,443:31553/TCP 43h
Let me see if the Ingress responds to my http requests (I have registered the IP above with a DNS record so I can resolve it):
andreasm@linuxmgmt01:~$ curl http://fruit.my-domain.net/apple
apple
andreasm@linuxmgmt01:~$ curl http://fruit.my-domain.net/banana
banana
The Ingress works.
Again for more information on Cilium IngressController (like supported annotations etc) head over here
Another ingress solution to use is Gateway API, read more about that here
Gateway API is an “evolution” of the regular Ingress, so it would be natural to take this into consideration going forward. Again Cilium supports Gateway API out of the box, I will use Helm to enable it and it just needs a couple of CRDs to be installed.
Read more on Cilium API support here.
To enable Cilium Gateway API I did the following:
Edit my Helm value.yaml with the following setting:
gatewayAPI:# -- Enable support for Gateway API in cilium# This will automatically set enable-envoy-config as well.enabled:true
Installed these CRDs before I ran the Helm upgrade command
andreasm@linuxmgmt01:~/test-cluster-1$ helm upgrade -n kube-system cilium cilium/cilium --version 1.14.5 -f cilium-values-feature-by-feature.yaml
Release "cilium" has been upgraded. Happy Helming!
NAME: cilium
LAST DEPLOYED: Wed Dec 20 11:12:55 2023NAMESPACE: kube-system
STATUS: deployed
REVISION: 16TEST SUITE: None
NOTES:
You have successfully installed Cilium with Hubble Relay and Hubble UI.
Your release version is 1.14.5.
For any further help, visit https://docs.cilium.io/en/v1.14/gettinghelp
andreasm@linuxmgmt01:~/test-cluster-1$ kubectl -n kube-system rollout restart deployment/cilium-operator
deployment.apps/cilium-operator restarted
andreasm@linuxmgmt01:~/test-cluster-1$ kubectl -n kube-system rollout restart ds/cilium
daemonset.apps/cilium restarted
It is very important to install the above CRDs first before attempting to enable the GatewayAPI in Cilium. Otherwise it will create any gatewayclass, aka no GatewayAPI realized.
Now I should have a gatewayClass:
andreasm@linuxmgmt01:~/test-cluster-1$ k get gatewayclasses.gateway.networking.k8s.io
NAME CONTROLLER ACCEPTED AGE
cilium io.cilium/gateway-controller True 96s
Now I can just go ahead and create a gateway and some httproutes. When it comes to providing an external IP address for my gateway, this is provided by my ip-pools the same way as for the IngressController.
Lets go ahead and create a gateway, and for this excercise I will be creating a gateway with corresponding httproutes to support my Harbor registry installation.
Below is the config I have used, this has also been configured to do a https redirect (from http to https):
I have already created the certificate as the secret I refer to in the yaml above.
Lets have a look at the gateway, httproutes and the svc that provides the external IP address , also the Harbor services the httproutes refer to:
#### gateway created #####andreasm@linuxmgmt01:~/prod-cluster-1/harbor$ k get gateway -n harbor
NAME CLASS ADDRESS PROGRAMMED AGE
harbor-tls-gateway cilium 10.150.14.32 True 28h
#### HTTPROUTES ####andreasm@linuxmgmt01:~/prod-cluster-1/harbor$ k get httproutes.gateway.networking.k8s.io -n harbor
NAME HOSTNAMES AGE
harbor-api-route ["registry.my-domain.net"] 28h
harbor-tls-redirect ["registry.my-domain.net"] 28h
andreasm@linuxmgmt01:~/prod-cluster-1/harbor$ k get svc -n harbor
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
cilium-gateway-harbor-tls-gateway LoadBalancer 10.21.27.25 10.150.14.32 80:32393/TCP,443:31932/TCP 28h
As you may recall, I did enable the two features Hubble Relay and Hubble UI as we can se below:
andreasm@linuxmgmt01:~$ k get svc -n kube-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
hubble-metrics ClusterIP None <none> 9965/TCP 35h
hubble-peer ClusterIP 10.23.182.223 <none> 443/TCP 42h
hubble-relay ClusterIP 10.23.182.76 <none> 80/TCP 40h
hubble-ui ClusterIP 10.23.31.4 <none> 80/TCP 36h
It is not exposed so I can reach from outside the Kubernetes cluster. So let me first start by just creating a serviceType loadBalancer service to expose the Hubble UI clusterIP service. Below is the yaml I use for that:
andreasm@linuxmgmt01:~/prod-cluster-1/cilium/services$ k get svc -n kube-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
hubble-ui-lb LoadBalancer 10.21.47.47 10.150.11.10 8081:32328/TCP 3d11h
There it is, now open my browser and point to this ip:port
Now let me go to a test application I have deployed in the yelb namespace. Click on it from the list or the dropdown top left corner:
Soo much empty…
I can see the pods are running:
andreasm@linuxmgmt01:~/prod-cluster-1/cilium/services$ k get pods -n yelb
NAME READY STATUS RESTARTS AGE
redis-server-84f4bf49b5-fq26l 1/1 Running 0 5d18h
yelb-appserver-6dc7cd98-s6kt7 1/1 Running 0 5d18h
yelb-db-84d6f6fc6c-m7xvd 1/1 Running 0 5d18h
They are probably not so interested in talking to each other unless they have to. Let me deploy the Fronted service and create some interactions.
andreasm@linuxmgmt01:~/prod-cluster-1/cilium$ k apply -f yelb-lb-frontend.yaml
service/yelb-ui created
deployment.apps/yelb-ui created
andreasm@linuxmgmt01:~/prod-cluster-1/cilium$ k get svc -n yelb
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
redis-server ClusterIP 10.21.67.23 <none> 6379/TCP 5d18h
yelb-appserver ClusterIP 10.21.81.95 <none> 4567/TCP 5d18h
yelb-db ClusterIP 10.21.188.43 <none> 5432/TCP 5d18h
yelb-ui LoadBalancer 10.21.207.114 10.150.11.221 80:32335/TCP 49s
I will now open the Yelb UI and do some quick “votes”
Instantly, even by just opening the yelb webpage I get a lot of flow information in Hubble. And not only that, it automatically creates a “service-map” so I can see the involved services in the Yelb app.
This will only show me L4 information. What about Layer 7? Lets test that also by heading over to Harbor
In Hubble I will switch to the namespace harbor
A nice diagram with all involced services, but no L7 Information yet. Well there is, but I have no recent interactions to Harbor using the Gateway API, as soon as I use docker or web-ui against harbor what happens then?
Whats this, an ingress object?
Now when I click on the ingress object:
Look at the L7 info coming there.
I logged out from Harbor:
Logged back in:
Browsing the Harbor Projects/repositories:
Very rich set of information presented in a very snappy and responsive dashbord. Its instantly updated as soon as there is a request coming.
For now, this concludes this post.
It has been a nice experience getting a bit more under the hood of Cilium, and so far I must say it looks very good.
Things I have not covered yet wich I will at a later stage
#
I will update this post with some other features at a later stage. Some of the features I am interested looking at is:
Security policies with Cilium - just have quick look here many interesting topics, Host Firewall?