Managing Antrea in vSphere with Tanzu

Overview

Antrea in vSphere with Tanzu

Antrea is the default CNI being used in TKG 2.0 clusters. TKG 2.0 clusters are the workload clusters you deploy with the Supervisor deployed in vSphere 8. Antrea comes in to flavours, we have the open source edition of Antrea which can be found here and then we have the Antrea Advanced ("downstream") version which is being used in vSphere with Tanzu. This version is also needed when we want to integrate Antrea with NSX-T for policy management. The Antrea Advanced can be found in your VMware customer connect portal here. Both version of Antrea has a very broad support Kubernetes platforms it can be used in. Antrea can be used for Windows worker nodes, Photon, Ubuntu, ARM, x86, VMware TKG, OpenShift, Rancher, AKS, EKS. the list is long see more info here. This post will be focusing on the Antrea Advanced edition and its features like (read more here):

  • Central management of Antrea Security Policies with NSX
  • Central troubleshooting with TraceFlow with NSX
  • FQDN/L7 Security policies
  • RBAC
  • Tiered policies
  • Flow Exporter
  • Egress (Source NAT IP selection of PODs egressing)

Managing Antrea settings and Feature Gates in TKG 2 clusters

When you deploy a TKG 2 cluster on vSphere with Tanzu and you dont specify a CNI Antrea will be de default CNI. Depending on the TKG version you are on a set of default Antrea features are enabled or disabled. You can easily check which features are enabled after a cluster has been provisioned by issuing the below command: If you know already before you deploy a cluster that a specific feature should be enabled or disabled this can also be handled during bring-up of the cluster so it should come with the settings you want. More on that later.

  1linux-vm:~/from_ubuntu_vm/tkgs/tkgs-stc-cpod$ k get configmaps -n kube-system antrea-config -oyaml
  2apiVersion: v1
  3data:
  4  antrea-agent.conf: |
  5    featureGates:
  6      AntreaProxy: true
  7      EndpointSlice: true
  8      Traceflow: true
  9      NodePortLocal: true
 10      AntreaPolicy: true
 11      FlowExporter: false
 12      NetworkPolicyStats: false
 13      Egress: true
 14      AntreaIPAM: false
 15      Multicast: false
 16      Multicluster: false
 17      SecondaryNetwork: false
 18      ServiceExternalIP: false
 19      TrafficControl: false
 20    trafficEncapMode: encap
 21    noSNAT: false
 22    tunnelType: geneve
 23    trafficEncryptionMode: none
 24    enableBridgingMode: false
 25    disableTXChecksumOffload: false
 26    wireGuard:
 27      port: 51820
 28    egress:
 29      exceptCIDRs: []
 30    serviceCIDR: 20.10.0.0/16
 31    nodePortLocal:
 32      enable: true
 33      portRange: 61000-62000
 34    tlsCipherSuites: TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_256_GCM_SHA384
 35    multicast: {}
 36    antreaProxy:
 37      proxyAll: false
 38      nodePortAddresses: []
 39      skipServices: []
 40      proxyLoadBalancerIPs: false
 41    multicluster: {}
 42  antrea-cni.conflist: |
 43    {
 44        "cniVersion":"0.3.0",
 45        "name": "antrea",
 46        "plugins": [
 47            {
 48                "type": "antrea",
 49                "ipam": {
 50                    "type": "host-local"
 51                }
 52            }
 53            ,
 54            {
 55                "type": "portmap",
 56                "capabilities": {"portMappings": true}
 57            }
 58            ,
 59            {
 60                "type": "bandwidth",
 61                "capabilities": {"bandwidth": true}
 62            }
 63        ]
 64    }
 65  antrea-controller.conf: |
 66    featureGates:
 67      Traceflow: true
 68      AntreaPolicy: true
 69      NetworkPolicyStats: false
 70      Multicast: false
 71      Egress: true
 72      AntreaIPAM: false
 73      ServiceExternalIP: false
 74    tlsCipherSuites: TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_256_GCM_SHA384
 75    nodeIPAM: null
 76kind: ConfigMap
 77metadata:
 78  annotations:
 79    kapp.k14s.io/identity: v1;kube-system//ConfigMap/antrea-config;v1
 80    kapp.k14s.io/original: '{"apiVersion":"v1","data":{"antrea-agent.conf":"featureGates:\n  AntreaProxy:
 81      true\n  EndpointSlice: true\n  Traceflow: true\n  NodePortLocal: true\n  AntreaPolicy:
 82      true\n  FlowExporter: false\n  NetworkPolicyStats: false\n  Egress: true\n  AntreaIPAM:
 83      false\n  Multicast: false\n  Multicluster: false\n  SecondaryNetwork: false\n  ServiceExternalIP:
 84      false\n  TrafficControl: false\ntrafficEncapMode: encap\nnoSNAT: false\ntunnelType:
 85      geneve\ntrafficEncryptionMode: none\nenableBridgingMode: false\ndisableTXChecksumOffload:
 86      false\nwireGuard:\n  port: 51820\negress:\n  exceptCIDRs: []\nserviceCIDR: 20.10.0.0/16\nnodePortLocal:\n  enable:
 87      true\n  portRange: 61000-62000\ntlsCipherSuites: TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_256_GCM_SHA384\nmulticast:
 88      {}\nantreaProxy:\n  proxyAll: false\n  nodePortAddresses: []\n  skipServices:
 89      []\n  proxyLoadBalancerIPs: false\nmulticluster: {}\n","antrea-cni.conflist":"{\n    \"cniVersion\":\"0.3.0\",\n    \"name\":
 90      \"antrea\",\n    \"plugins\": [\n        {\n            \"type\": \"antrea\",\n            \"ipam\":
 91      {\n                \"type\": \"host-local\"\n            }\n        }\n        ,\n        {\n            \"type\":
 92      \"portmap\",\n            \"capabilities\": {\"portMappings\": true}\n        }\n        ,\n        {\n            \"type\":
 93      \"bandwidth\",\n            \"capabilities\": {\"bandwidth\": true}\n        }\n    ]\n}\n","antrea-controller.conf":"featureGates:\n  Traceflow:
 94      true\n  AntreaPolicy: true\n  NetworkPolicyStats: false\n  Multicast: false\n  Egress:
 95      true\n  AntreaIPAM: false\n  ServiceExternalIP: false\ntlsCipherSuites: TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_256_GCM_SHA384\nnodeIPAM:
 96      null\n"},"kind":"ConfigMap","metadata":{"labels":{"app":"antrea","kapp.k14s.io/app":"1685607245932804320","kapp.k14s.io/association":"v1.c39c4aca919097e50452c3432329dd40"},"name":"antrea-config","namespace":"kube-system"}}'
 97    kapp.k14s.io/original-diff-md5: c6e94dc94aed3401b5d0f26ed6c0bff3
 98  creationTimestamp: "2023-06-01T08:14:14Z"
 99  labels:
100    app: antrea
101    kapp.k14s.io/app: "1685607245932804320"
102    kapp.k14s.io/association: v1.c39c4aca919097e50452c3432329dd40
103  name: antrea-config
104  namespace: kube-system
105  resourceVersion: "948"
106  uid: fd18fd20-a82b-4df5-bb1a-686463b86f27

If you want to enable or disable any of these features its a matter of applying an AntreaConfig using the included AntreaConfig CRD in TKG 2.0.

One can apply this AntreaConfig on an already provisioned TKG 2.0 cluster or apply before the cluster is provisioned so it will get the features enabled or disabled at creation. Below is an example of AntreaConfig:

 1apiVersion: cni.tanzu.vmware.com/v1alpha1
 2kind: AntreaConfig
 3metadata:
 4  name: three-zone-cluster-2-antrea-package
 5  namespace: ns-three-zone-1
 6spec:
 7  antrea:
 8    config:
 9      featureGates:
10        AntreaProxy: true
11        EndpointSlice: false
12        AntreaPolicy: true
13        FlowExporter: true
14        Egress: true #This needs to be enabled (an example)
15        NodePortLocal: true
16        AntreaTraceflow: true
17        NetworkPolicyStats: true

This example is applied either before or after provisioning of the TKG 2.0 cluster. Just make sure the config has been applied to the correct NS, the same NS as the cluster is deployed in and the name of the config needs to start like this CLUSTER-NAME-antrea-package. In other words the name needs to start with the clustername of the TKG 2.0 cluster and end with -antrea-package.

If it is being done after the cluster has provisioned we need to make sure the already running Antrea pods (agents and controller) are restarted so they can read the new configmap.

If you need to check which version of Antrea is included in your TKR version (and other components for that sake) just run the following command:

 1linuxvm01:~/three-zones$ k get tkr v1.24.9---vmware.1-tkg.4 -o yaml
 2apiVersion: run.tanzu.vmware.com/v1alpha3
 3kind: TanzuKubernetesRelease
 4metadata:
 5  creationTimestamp: "2023-06-01T07:35:28Z"
 6  finalizers:
 7  - tanzukubernetesrelease.run.tanzu.vmware.com
 8  generation: 2
 9  labels:
10    os-arch: amd64
11    os-name: photon
12    os-type: linux
13    os-version: "3.0"
14    v1: ""
15    v1.24: ""
16    v1.24.9: ""
17    v1.24.9---vmware: ""
18    v1.24.9---vmware.1: ""
19    v1.24.9---vmware.1-tkg: ""
20    v1.24.9---vmware.1-tkg.4: ""
21  name: v1.24.9---vmware.1-tkg.4
22  ownerReferences:
23  - apiVersion: vmoperator.vmware.com/v1alpha1
24    kind: VirtualMachineImage
25    name: ob-21552850-ubuntu-2004-amd64-vmi-k8s-v1.24.9---vmware.1-tkg.4
26    uid: 92d3d6af-53f8-4f9a-b262-f70dd33ad19b
27  - apiVersion: vmoperator.vmware.com/v1alpha1
28    kind: VirtualMachineImage
29    name: ob-21554409-photon-3-amd64-vmi-k8s-v1.24.9---vmware.1-tkg.4
30    uid: 6a0aa87a-63e3-475d-a52d-e63589f454e9
31  resourceVersion: "12111"
32  uid: 54db049e-fdf0-45a2-b4d1-46fa90a22b44
33spec:
34  bootstrapPackages:
35  - name: antrea.tanzu.vmware.com.1.7.2+vmware.1-tkg.1-advanced
36  - name: vsphere-pv-csi.tanzu.vmware.com.2.6.1+vmware.1-tkg.1
37  - name: vsphere-cpi.tanzu.vmware.com.1.24.3+vmware.1-tkg.1
38  - name: kapp-controller.tanzu.vmware.com.0.41.5+vmware.1-tkg.1
39  - name: guest-cluster-auth-service.tanzu.vmware.com.1.1.0+tkg.1
40  - name: metrics-server.tanzu.vmware.com.0.6.2+vmware.1-tkg.1
41  - name: secretgen-controller.tanzu.vmware.com.0.11.2+vmware.1-tkg.1
42  - name: pinniped.tanzu.vmware.com.0.12.1+vmware.3-tkg.3
43  - name: capabilities.tanzu.vmware.com.0.28.0+vmware.2
44  - name: calico.tanzu.vmware.com.3.24.1+vmware.1-tkg.1
45  kubernetes:
46    coredns:
47      imageTag: v1.8.6_vmware.15
48    etcd:
49      imageTag: v3.5.6_vmware.3
50    imageRepository: localhost:5000/vmware.io
51    pause:
52      imageTag: "3.7"
53    version: v1.24.9+vmware.1
54  osImages:
55  - name: ob-21552850-ubuntu-2004-amd64-vmi-k8s-v1.24.9---vmware.1-tkg.4
56  - name: ob-21554409-photon-3-amd64-vmi-k8s-v1.24.9---vmware.1-tkg.4
57  version: v1.24.9+vmware.1-tkg.4
58status:
59  conditions:
60  - lastTransitionTime: "2023-06-01T07:35:28Z"
61    status: "True"
62    type: Ready
63  - lastTransitionTime: "2023-06-01T07:35:28Z"
64    status: "True"
65    type: Compatible

So enabling and disabling Antrea Feature Gates is quite simple. To summarize, the feature gates that can be adjusted is these (as of TKR 1.24.9):

 1spec:
 2  antrea:
 3    config:
 4      defaultMTU: ""
 5      disableUdpTunnelOffload: false
 6      featureGates:
 7        AntreaPolicy: true
 8        AntreaProxy: true
 9        AntreaTraceflow: true
10        Egress: true
11        EndpointSlice: true
12        FlowExporter: false
13        NetworkPolicyStats: false
14        NodePortLocal: true
15      noSNAT: false
16      tlsCipherSuites: TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_256_GCM_SHA384
17      trafficEncapMode: encap

Getting the Antrea config "templates" for a specific TKR version

Usually with new TKR versions, a new version of Antrea is shipped. And with a new version of Antrea is shipped it most liley containt new and exciting features. So if you want to see which feature gates are being available in your latest and greatest TKR, run these commands from the Supervisor context:

 1# to get all the Antrea configs 
 2andreasm@ubuntu02:~/avi_nsxt_wcp$ k get antreaconfigs.cni.tanzu.vmware.com -A
 3NAMESPACE           NAME                                           TRAFFICENCAPMODE   DEFAULTMTU   ANTREAPROXY   ANTREAPOLICY   SECRETREF
 4ns-stc-1            cluster-1-antrea-package                       encap                           true          true           cluster-1-antrea-data-values
 5vmware-system-tkg   v1.23.15---vmware.1-tkg.4                      encap                           true          true
 6vmware-system-tkg   v1.23.15---vmware.1-tkg.4-routable             noEncap                         true          true
 7vmware-system-tkg   v1.23.8---vmware.2-tkg.2-zshippable            encap                           true          true
 8vmware-system-tkg   v1.23.8---vmware.2-tkg.2-zshippable-routable   noEncap                         true          true
 9vmware-system-tkg   v1.24.9---vmware.1-tkg.4                       encap                           true          true
10vmware-system-tkg   v1.24.9---vmware.1-tkg.4-routable              noEncap                         true          true
11vmware-system-tkg   v1.25.7---vmware.3-fips.1-tkg.1                encap                           true          true
12vmware-system-tkg   v1.25.7---vmware.3-fips.1-tkg.1-routable       noEncap                         true          true
13vmware-system-tkg   v1.26.5---vmware.2-fips.1-tkg.1                encap                           true          true
14vmware-system-tkg   v1.26.5---vmware.2-fips.1-tkg.1-routable       noEncap                         true          true
15
16# Get the content of a specific Antrea config
17andreasm@ubuntu02:~/avi_nsxt_wcp$ k get antreaconfigs.cni.tanzu.vmware.com -n vmware-system-tkg v1.26.5---vmware.2-fips.1-tkg.1 -oyaml
18apiVersion: cni.tanzu.vmware.com/v1alpha1
19kind: AntreaConfig
20metadata:
21  annotations:
22    tkg.tanzu.vmware.com/template-config: "true"
23  creationTimestamp: "2023-09-24T17:49:37Z"
24  generation: 1
25  name: v1.26.5---vmware.2-fips.1-tkg.1
26  namespace: vmware-system-tkg
27  resourceVersion: "19483"
28  uid: 8cdaa6ec-4059-4d35-a0d4-63711831edc8
29spec:
30  antrea:
31    config:
32      antreaProxy:
33        proxyLoadBalancerIPs: true
34      defaultMTU: ""
35      disableTXChecksumOffload: false
36      disableUdpTunnelOffload: false
37      dnsServerOverride: ""
38      enableBridgingMode: false
39      enableUsageReporting: false
40      featureGates:
41        AntreaIPAM: false
42        AntreaPolicy: true
43        AntreaProxy: true
44        AntreaTraceflow: true
45        Egress: true
46        EndpointSlice: true
47        FlowExporter: false
48        Multicast: false
49        Multicluster: false
50        NetworkPolicyStats: true
51        NodePortLocal: true
52        SecondaryNetwork: false
53        ServiceExternalIP: false
54        TopologyAwareHints: false
55        TrafficControl: false
56      flowExporter:
57        activeFlowTimeout: 60s
58        collectorAddress: flow-aggregator/flow-aggregator:4739:tls
59      noSNAT: false
60      tlsCipherSuites: TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_256_GCM_SHA384
61      trafficEncapMode: encap
62      tunnelCsum: false
63      tunnelPort: 0

With the above you can always get the latest config coming with the specific TKR release and use it as a template for your TKC cluster.

Integrating Antrea with NSX-T

To enable the NSX-T Antrea integration there is a couple of steps that needs to be prepared. All the steps can be followed here. I have decided to create a script that automates all these steps. So if you dont want to go through all these steps manually by following the link above you can use this script instead and just enter the necesarry information as prompted, and have the pre-requisities in place before excecuting. Copy and paste the below script into a .sh file on your Linux jumpiest and make it executable with chmod +x.

  1#!/bin/bash
  2
  3# Echo information
  4echo "This script has some dependencies... make sure they are met before continuing. Otherwise click ctrl+c now
  51. This script is adjusted for vSphere with Tanzu TKG clusters using Tanzu CLI
  62. Have downloaded the antrea-interworking*.zip
  73. This script is located in the root of where you have downloaded the zip file above
  84. curl is installed
  95. Need connectivity to the NSX manager
 106. kubectl is installed
 117. vsphere with tanzu cli is installed
 128. That you are in the correct context of the cluster you want to integrate to NSX
 139. If not in the correct context the script will put you in the correct context anyway
 1410. A big smile and good mood"
 15
 16# Prompt the user to press a key to continue
 17echo "Press any key to continue..."
 18read -n 1 -s
 19
 20# Continue with the script
 21echo "Continuing..."
 22
 23# Prompt for name
 24read -p "Enter the name of the tkg cluster - will be used for certificates and name in NSX: " name
 25
 26# Prompt for NSX_MGR
 27read -p "Enter NSX Manager ip or FQDN: " nsx_mgr
 28
 29# Prompt for NSX_ADMIN
 30read -p "Enter NSX admin username: " nsx_admin
 31
 32# Prompt for NSX_PASS
 33read -p "Enter NSX Password: " nsx_pass
 34
 35# Prompt for Supervisor Endpoint IP or FQDN
 36read -p "Enter Supervisor API IP or FQDN: " svc_api_ip
 37
 38# Prompt for vSphere Username
 39read -p "Enter vSphere Username: " vsphere_username
 40
 41# Prompt for Tanzu Kubernetes Cluster Namespace
 42read -p "Enter Tanzu Kubernetes Cluster Namespace: " tanzu_cluster_namespace
 43
 44# Prompt for Tanzu Kubernetes Cluster Name
 45read -p "Enter Tanzu Kubernetes Cluster Name: " tanzu_cluster_name
 46
 47# Login to vSphere using kubectl
 48kubectl vsphere login --server="$svc_api_ip" --insecure-skip-tls-verify --vsphere-username="$vsphere_username" --tanzu-kubernetes-cluster-namespace="$tanzu_cluster_namespace" --tanzu-kubernetes-cluster-name="$tanzu_cluster_name"
 49
 50key_name="${name}-private.key"
 51csr_output="${name}.csr"
 52crt_output="${name}.crt"
 53
 54openssl genrsa -out "$key_name" 2048
 55openssl req -new -key "$key_name" -out "$csr_output" -subj "/C=US/ST=CA/L=Palo Alto/O=VMware/OU=Antrea Cluster/CN=$name"
 56openssl x509 -req -days 3650 -sha256 -in "$csr_output" -signkey "$key_name" -out "$crt_output"
 57
 58# Convert the certificate file to a one-liner with line breaks
 59crt_contents=$(awk '{printf "%s\\n", $0}' "$crt_output")
 60
 61# Replace the certificate and name in the curl body
 62curl_body='{
 63    "name": "'"$name"'",
 64    "node_id": "'"$name"'",
 65    "roles_for_paths": [
 66        {
 67            "path": "/",
 68            "roles": [
 69                {
 70                    "role": "enterprise_admin"
 71                }
 72            ]
 73        }
 74    ],
 75    "role": "enterprise_admin",
 76    "is_protected": "true",
 77    "certificate_pem" : "'"$crt_contents"'"
 78}'
 79
 80# Make the curl request with the updated body
 81# curl -X POST -H "Content-Type: application/json" -d "$curl_body" https://example.com/api/endpoint
 82curl -ku "$nsx_admin":"$nsx_pass" -X POST https://"$nsx_mgr"/api/v1/trust-management/principal-identities/with-certificate -H "Content-Type: application/json" -d "$curl_body"
 83
 84# Check if a subfolder starting with "antrea-interworking" exists
 85if ls -d antrea-interworking* &>/dev/null; then
 86    echo "Subfolder starting with 'antrea-interworking' exists. Skipping extraction."
 87else
 88    # Extract the zip file starting with "antrea-interworking"
 89    unzip "antrea-interworking"*.zip
 90fi
 91
 92# Create a new folder with the name antrea-interworking-"from-name"
 93new_folder="antrea-interworking-$name"
 94mkdir "$new_folder"
 95
 96# Copy all YAML files from the antrea-interworking subfolder to the new folder
 97cp antrea-interworking*/{*.yaml,*.yml} "$new_folder/"
 98
 99# Replace the field after "image: vmware.io/antrea/interworking" with "image: projects.registry.vmware.com/antreainterworking/interworking-debian" in interworking.yaml
100sed -i 's|image: vmware.io/antrea/interworking|image: projects.registry.vmware.com/antreainterworking/interworking-debian|' "$new_folder/interworking.yaml"
101
102# Replace the field after "image: vmware.io/antrea/interworking" with "image: projects.registry.vmware.com/antreainterworking/interworking-debian" in deregisterjob.yaml
103sed -i 's|image: vmware.io/antrea/interworking|image: projects.registry.vmware.com/antreainterworking/interworking-debian|' "$new_folder/deregisterjob.yaml"
104
105# Edit the bootstrap.yaml file in the new folder
106sed -i 's|clusterName:.*|clusterName: '"$name"'|' "$new_folder/bootstrap-config.yaml"
107sed -i 's|NSXManagers:.*|NSXManagers: ["'"$nsx_mgr"'"]|' "$new_folder/bootstrap-config.yaml"
108tls_crt_base64=$(base64 -w 0 "$crt_output")
109sed -i 's|tls.crt:.*|tls.crt: '"$tls_crt_base64"'|' "$new_folder/bootstrap-config.yaml"
110tls_key_base64=$(base64 -w 0 "$key_name")
111sed -i 's|tls.key:.*|tls.key: '"$tls_key_base64"'|' "$new_folder/bootstrap-config.yaml"
112
113# Interactive prompt to select Kubernetes context
114kubectl config get-contexts
115read -p "Enter the name of the Kubernetes context: " kubectl_context
116kubectl config use-context "$kubectl_context"
117
118# Apply the bootstrap-config.yaml and interworking.yaml files from the new folder
119kubectl apply -f "$new_folder/bootstrap-config.yaml" -f "$new_folder/interworking.yaml"
120
121# Run the last command to verify that something is happening
122kubectl get pods -o wide -n vmware-system-antrea
123
124echo "As it was written each time we ssh'ed into a Suse Linux back in the good old days - Have a lot of fun"

As soon as the script has been processed through it should not take long until you have your TKG cluster in the NSX manager:

container-cluster-nsx

Thats it for the NSX-T integration, as soon as that have been done its time to look into what we can do with this integration in the following chapters

Antrea Security Policies

Antrea has two sets of security policies, Antrea Network Policies (ANP) and Antrea Cluster Network Policies (ACNP). The difference between these two is that ANP is applied on a Kubernetes Namespace and ACNP is cluster-wide. Both belongs to Antrea Native Policies. Both ANP and ACNP can work together with Kubernetes Network Policies.

There are many benefits of using Antrea Native Policies in combination or not in combination with Kubernetes Network Policies.

Some of the benefits of using Antrea Native Policies:

  • Can be tiered
  • Select both ingress and egress
  • Support the following actions: allow, drop, reject and pass
  • Support FQDN filtering in egress (to) with actions allow, drop and reject

Tiered policies

The benefit of having tiered policies is very useful when for example we have different parts of the organization are responsible for security at different levels/scopes in the platform. Antrea can have policies placed in different tiers where the tiers are evaluated in a given order. If we want some rules to be very early in the policy evaluation and enforced as soon as possible we can place rule in a tier that is considered first, then within the same tier the rules or policies are also being enforced in the order of a given priority, a number. The rule with the lowest number (higher priority) will be evaluated first and then when all rules in a tier has been processed it will go to the next tier. Antrea comes with a set of static tiers already defined. These tier can be shown by running the command:

1linuxvm01:~$ k get tiers
2NAME                          PRIORITY   AGE
3application                   250        3h11m
4baseline                      253        3h11m
5emergency                     50         3h11m
6networkops                    150        3h11m
7platform                      200        3h11m
8securityops                   100        3h11m

Below will show a diagram of how they look, notice also where the Kubernets network policies will be placed:

antrea-tiers

There is also the option to add custom tiers using the following CRD (taken from the offical Antrea docs here:

1apiVersion: crd.antrea.io/v1alpha1
2kind: Tier
3metadata:
4  name: mytier
5spec:
6  priority: 10
7  description: "my custom tier"

When doing the Antrea NSX integration some additional tiers are added automatically (they start with nsx*):

 1linuxvm01:~$ k get tiers
 2NAME                          PRIORITY   AGE
 3application                   250        3h11m
 4baseline                      253        3h11m
 5emergency                     50         3h11m
 6networkops                    150        3h11m
 7nsx-category-application      4          87m
 8nsx-category-emergency        1          87m
 9nsx-category-environment      3          87m
10nsx-category-ethernet         0          87m
11nsx-category-infrastructure   2          87m
12platform                      200        3h11m
13securityops                   100        3h11m

I can quickly show two examples where I create one rule as a "security-admin", where this security admin has to follow the company's compliance policy to block access to a certain FQDN. This must be enforced all over. So I need to create this policy in the securityops tier. I could have defined it in the emergency tier also but in this tier it makes more sense to have rules applied that are disabled/not-enforced/idle in case of an emergency and we need a way to quickly enable it and override rules later down the hierarchy. So securityops it is:

Lets apply this one:

 1apiVersion: crd.antrea.io/v1alpha1
 2kind: ClusterNetworkPolicy
 3metadata:
 4  name: acnp-drop-yelb
 5spec:
 6  priority: 1
 7  tier: securityops
 8  appliedTo:
 9  - podSelector:
10      matchLabels:
11        app: ubuntu-20-04
12  egress:
13  - action: Drop
14    to:
15      - fqdn: "yelb-ui.yelb.carefor.some-dns.net"
16    ports:
17      - protocol: TCP
18        port: 80
19  - action: Allow  #Allow the rest

To check if it is applied and in use (notice under desired nodes and current nodes):

1linuxvm01:~/antrea/policies$ k get acnp
2NAME             TIER          PRIORITY   DESIRED NODES   CURRENT NODES   AGE
3acnp-drop-yelb   securityops   1          1               1               5m33s

Now from a test pod I will try to curl the blocked fqdn and another one not in any block rule:

 1root@ubuntu-20-04-548545fc87-kkzbh:/# curl yelb-ui.yelb.cloudburst.somecooldomain.net
 2curl: (6) Could not resolve host: yelb-ui.yelb.cloudburst.somecooldomain.net
 3
 4# Curling a FQDN that is allowed:
 5root@ubuntu-20-04-548545fc87-kkzbh:/# curl allowed-yelb.yelb-2.carefor.some-dns.net
 6<!doctype html>
 7<html>
 8<head>
 9    <meta charset="utf-8">
10    <title>Yelb</title>
11    <base href="/">
12    <meta name="viewport" content="width=device-width, initial-scale=1">
13    <link rel="icon" type="image/x-icon" href="favicon.ico?v=2">
14</head>
15<body>
16<yelb>Loading...</yelb>
17<script type="text/javascript" src="inline.bundle.js"></script><script type="text/javascript" src="styles.bundle.js"></script><script type="text/javascript" src="scripts.bundle.js"></script><script type="text/javascript" src="vendor.bundle.js"></script><script type="text/javascript" src="main.bundle.js"></script></body>
18</html>

That works as expected. Now what happens then if another use with access to the Kubernetes cluster decide to create a rule later down in the hierarchy, lets go with the application tier, to create an allow rule for this FQDN that is currently being dropped? Lets see what happens

 1apiVersion: crd.antrea.io/v1alpha1
 2kind: ClusterNetworkPolicy
 3metadata:
 4  name: acnp-allow-yelb
 5spec:
 6  priority: 1
 7  tier: application
 8  appliedTo:
 9  - podSelector:
10      matchLabels:
11        app: ubuntu-20-04
12  egress:
13  - action: Allow
14    to:
15      - fqdn: "yelb-ui.yelb.carefor.some-dns.net"
16    ports:
17      - protocol: TCP
18        port: 80
19  - action: Allow  #Allow the rest

I will apply this above rule and then try to curl the same fqdn which is supposed to be dropped.

1linuxvm01:~/antrea/policies$ k get acnp
2NAME              TIER          PRIORITY   DESIRED NODES   CURRENT NODES   AGE
3acnp-allow-yelb   application   1          1               1               4s
4acnp-drop-yelb    securityops   1          1               1               5h1m

From my test pod again:

 1kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
 2root@ubuntu-20-04-548545fc87-kkzbh:/# curl yelb-ui.yelb.cloudburst.somecooldomain.net
 3curl: (6) Could not resolve host: yelb-ui.yelb.cloudburst.somecooldomain.net
 4root@ubuntu-20-04-548545fc87-kkzbh:/# curl allowed-yelb.yelb-2.carefor.some-dns.net
 5<!doctype html>
 6<html>
 7<head>
 8    <meta charset="utf-8">
 9    <title>Yelb</title>
10    <base href="/">
11    <meta name="viewport" content="width=device-width, initial-scale=1">
12    <link rel="icon" type="image/x-icon" href="favicon.ico?v=2">
13</head>
14<body>
15<yelb>Loading...</yelb>
16<script type="text/javascript" src="inline.bundle.js"></script><script type="text/javascript" src="styles.bundle.js"></script><script type="text/javascript" src="scripts.bundle.js"></script><script type="text/javascript" src="vendor.bundle.js"></script><script type="text/javascript" src="main.bundle.js"></script></body>
17</html>

That was expected. It is still being dropped by the first rule placed in the securityops tier. So far so good. But what if this user also has access to the tier where the first rule is applied? Well, then they can override it. That is why I we can now go to the next chapter.

Antrea RBAC

Antrea comes with a couple of CRDs that allow us to configure granular user permissions on the different objects, like the Policy Tiers. So to restrict "normal" users from applying and/or delete security polices created in the higher priority Tiers we need to apply some rolebindings, or to be exact ClusterRoleBindings. Let us see how we can achieve that.

In my lab environment I have defined two users, my own admin user (andreasm) that is part of the ClusterRole/cluster-admin and a second user (User1) that is part of the the ClusterRole/view. The ClusterRole View has only read access, not to all objects in the cluster but many. To see what run the following command:

  1linuxvm01:~/antrea/policies$ k get clusterrole view -oyaml
  2aggregationRule:
  3  clusterRoleSelectors:
  4  - matchLabels:
  5      rbac.authorization.k8s.io/aggregate-to-view: "true"
  6apiVersion: rbac.authorization.k8s.io/v1
  7kind: ClusterRole
  8metadata:
  9  annotations:
 10    rbac.authorization.kubernetes.io/autoupdate: "true"
 11  creationTimestamp: "2023-06-04T09:37:44Z"
 12  labels:
 13    kubernetes.io/bootstrapping: rbac-defaults
 14    rbac.authorization.k8s.io/aggregate-to-edit: "true"
 15  name: view
 16  resourceVersion: "1052"
 17  uid: c4784a81-4451-42af-9134-e141ccf8bc50
 18rules:
 19- apiGroups:
 20  - crd.antrea.io
 21  resources:
 22  - clustergroups
 23  verbs:
 24  - get
 25  - list
 26  - watch
 27- apiGroups:
 28  - crd.antrea.io
 29  resources:
 30  - clusternetworkpolicies
 31  - networkpolicies
 32  verbs:
 33  - get
 34  - list
 35  - watch
 36- apiGroups:
 37  - crd.antrea.io
 38  resources:
 39  - traceflows
 40  verbs:
 41  - get
 42  - list
 43  - watch
 44- apiGroups:
 45  - ""
 46  resources:
 47  - configmaps
 48  - endpoints
 49  - persistentvolumeclaims
 50  - persistentvolumeclaims/status
 51  - pods
 52  - replicationcontrollers
 53  - replicationcontrollers/scale
 54  - serviceaccounts
 55  - services
 56  - services/status
 57  verbs:
 58  - get
 59  - list
 60  - watch
 61- apiGroups:
 62  - ""
 63  resources:
 64  - bindings
 65  - events
 66  - limitranges
 67  - namespaces/status
 68  - pods/log
 69  - pods/status
 70  - replicationcontrollers/status
 71  - resourcequotas
 72  - resourcequotas/status
 73  verbs:
 74  - get
 75  - list
 76  - watch
 77- apiGroups:
 78  - ""
 79  resources:
 80  - namespaces
 81  verbs:
 82  - get
 83  - list
 84  - watch
 85- apiGroups:
 86  - discovery.k8s.io
 87  resources:
 88  - endpointslices
 89  verbs:
 90  - get
 91  - list
 92  - watch
 93- apiGroups:
 94  - apps
 95  resources:
 96  - controllerrevisions
 97  - daemonsets
 98  - daemonsets/status
 99  - deployments
100  - deployments/scale
101  - deployments/status
102  - replicasets
103  - replicasets/scale
104  - replicasets/status
105  - statefulsets
106  - statefulsets/scale
107  - statefulsets/status
108  verbs:
109  - get
110  - list
111  - watch
112- apiGroups:
113  - autoscaling
114  resources:
115  - horizontalpodautoscalers
116  - horizontalpodautoscalers/status
117  verbs:
118  - get
119  - list
120  - watch
121- apiGroups:
122  - batch
123  resources:
124  - cronjobs
125  - cronjobs/status
126  - jobs
127  - jobs/status
128  verbs:
129  - get
130  - list
131  - watch
132- apiGroups:
133  - extensions
134  resources:
135  - daemonsets
136  - daemonsets/status
137  - deployments
138  - deployments/scale
139  - deployments/status
140  - ingresses
141  - ingresses/status
142  - networkpolicies
143  - replicasets
144  - replicasets/scale
145  - replicasets/status
146  - replicationcontrollers/scale
147  verbs:
148  - get
149  - list
150  - watch
151- apiGroups:
152  - policy
153  resources:
154  - poddisruptionbudgets
155  - poddisruptionbudgets/status
156  verbs:
157  - get
158  - list
159  - watch
160- apiGroups:
161  - networking.k8s.io
162  resources:
163  - ingresses
164  - ingresses/status
165  - networkpolicies
166  verbs:
167  - get
168  - list
169  - watch
170- apiGroups:
171  - metrics.k8s.io
172  resources:
173  - pods
174  - nodes
175  verbs:
176  - get
177  - list
178  - watch
179- apiGroups:
180  - policy
181  resourceNames:
182  - vmware-system-privileged
183  resources:
184  - podsecuritypolicies
185  verbs:
186  - use

On the other hand my own admin user has access to everything, get, list, create, patch, update, delete - the whole shabang. What I would like to demonstrate now is that user1 is a regular user and should only be allowed to create security policies in the Tier application while all other Tiers is restricted to the admins that have the responsibility to create policies there. User1 should also not be allowed to create any custom Tiers.

So the first thing I need to create is an Antrea TierEntitlement and TierEntitlementBinding like this:

 1apiVersion: crd.antrea.tanzu.vmware.com/v1alpha1
 2kind: TierEntitlement
 3metadata:
 4    name: secops-edit
 5spec:
 6    tiers:       # Accept list of Tier names. Tier may or may not exist yet.
 7    - emergency
 8    - securityops
 9    - networkops
10    - platform
11    - baseline
12    permission: edit
13---
14apiVersion: crd.antrea.tanzu.vmware.com/v1alpha1
15kind: TierEntitlementBinding
16metadata:
17    name: secops-bind
18spec:
19  subjects:                                       # List of users to grant this entitlement to
20  -   kind: User
21      name: sso:andreasm@cpod-nsxam-stc.az-stc.cloud-garage.net
22      apiGroup: rbac.authorization.k8s.io
23#  -   kind: Group
24#      name: security-admins
25#      apiGroup: rbac.authorization.k8s.io
26#  -   kind: ServiceAccount
27#      name: network-admins
28#      namespace: kube-system
29  tierEntitlement: secops-edit              # Reference to the TierEntitlement

Now, notice that I am listing the Tiers that should only be available for the users, groups, or ServiceAccounts in the TierEntitlementBinding (I am only using Kind: User in this example). This means that all unlisted tiers should be allowed for other users to place security policies in.

Now apply it:

1linuxvm01:~/antrea/policies$ k apply -f tierentitlement.yaml
2tierentitlement.crd.antrea.tanzu.vmware.com/secops-edit created
3tierentitlementbinding.crd.antrea.tanzu.vmware.com/secops-bind created

Next up is to add my User1 to the Antrea CRD "tiers" to be allowed to list and get the tiers:

 1apiVersion: rbac.authorization.k8s.io/v1
 2kind: ClusterRole
 3metadata:
 4  name: tier-placement
 5rules:
 6- apiGroups: ["crd.antrea.io"]
 7  resources: ["tiers"]
 8  verbs: ["get","list"]
 9---
10apiVersion: rbac.authorization.k8s.io/v1
11kind: ClusterRoleBinding
12metadata:
13  name: tier-bind
14subjects:
15- kind: User
16  name: sso:user1@cpod-nsxam-stc.az-stc.cloud-garage.net # Name is case sensitive
17  apiGroup: rbac.authorization.k8s.io
18roleRef:
19  kind: ClusterRole
20  name: tier-placement
21  apiGroup: rbac.authorization.k8s.io

If you want some user to also add/create/delete custom Tiers this can be allowed by adding: "create","patch","update","delete"

Now apply the above yaml:

1linuxvm01:~/antrea/policies$ k apply -f antrea-crd-tier-list.yaml
2clusterrole.rbac.authorization.k8s.io/tier-placement created
3clusterrolebinding.rbac.authorization.k8s.io/tier-bind created

I will now log in with the User1 and try to apply this network policy:

 1apiVersion: crd.antrea.io/v1alpha1
 2kind: ClusterNetworkPolicy
 3metadata:
 4  name: override-rule-allow-yelb
 5spec:
 6  priority: 1
 7  tier: securityops
 8  appliedTo:
 9  - podSelector:
10      matchLabels:
11        app: ubuntu-20-04
12  egress:
13  - action: Allow
14    to:
15      - fqdn: "yelb-ui.yelb.carefor.some-dns.net"
16    ports:
17      - protocol: TCP
18        port: 80
19  - action: Allow

As User1:

1linuxvm01:~/antrea/policies$ k apply -f fqdn-rule-secops-tier.test.yaml
2Error from server (Forbidden): error when creating "fqdn-rule-secops-tier.test.yaml": clusternetworkpolicies.crd.antrea.io is forbidden: User "sso:user1@cpod-nsxam-stc.az-stc.cloud-garage.net" cannot create resource "clusternetworkpolicies" in API group "crd.antrea.io" at the cluster scope

First bump in the road.. This user is not allowed to create any security policies at all.

So I need to use my admin user and apply this ClusterRoleBinding:

 1apiVersion: rbac.authorization.k8s.io/v1
 2kind: ClusterRole
 3metadata:
 4  name: clusternetworkpolicies-edit
 5rules:
 6- apiGroups: ["crd.antrea.io"]
 7  resources: ["clusternetworkpolicies"]
 8  verbs: ["get","list","create","patch","update","delete"]
 9---
10apiVersion: rbac.authorization.k8s.io/v1
11kind: ClusterRoleBinding
12metadata:
13  name: clusternetworkpolicies-bind
14subjects:
15- kind: User
16  name: sso:user1@cpod-nsxam-stc.az-stc.cloud-garage.net # Name is case sensitive
17  apiGroup: rbac.authorization.k8s.io
18roleRef:
19  kind: ClusterRole
20  name: clusternetworkpolicies-edit
21  apiGroup: rbac.authorization.k8s.io

Now the user1 has access to create policies... Lets try again:

1linuxvm01:~/antrea/policies$ k apply -f fqdn-rule-secops-tier.test.yaml
2Error from server: error when creating "fqdn-rule-secops-tier.test.yaml": admission webhook "acnpvalidator.antrea.io" denied the request: user not authorized to access Tier securityops

There it is, I am not allowed to place any security policies in the tier securityops. That is what I wanted to achieve, so thats good. What if user1 tries to apply a policy in the application tier? Lets see:

 1apiVersion: crd.antrea.io/v1alpha1
 2kind: ClusterNetworkPolicy
 3metadata:
 4  name: override-attempt-failed-allow-yelb
 5spec:
 6  priority: 1
 7  tier: application
 8  appliedTo:
 9  - podSelector:
10      matchLabels:
11        app: ubuntu-20-04
12  egress:
13  - action: Allow
14    to:
15      - fqdn: "yelb-ui.yelb.carefor.some-dns.net"
16    ports:
17      - protocol: TCP
18        port: 80
19  - action: Allow
1linuxvm01:~/antrea/policies$ k apply -f fqdn-rule-baseline-tier.test.yaml
2clusternetworkpolicy.crd.antrea.io/override-attempt-failed-allow-yelb created
3linuxvm01:~/antrea/policies$ k get acnp
4NAME                                 TIER          PRIORITY   DESIRED NODES   CURRENT NODES   AGE
5acnp-allow-yelb                      application   1          1               1               147m
6acnp-drop-yelb                       securityops   1          1               1               18h
7override-attempt-failed-allow-yelb   application   1          1               1               11s

That worked, even though the above rule is trying to allow access to yelb it will not allow it due to the Drop rule in the securityops Tier. So how much the User1 tries to get this access it will be blocked.

These users....

What if user1 tries to apply the same policy without stating any Tier in in the policy? Lets see:

 1apiVersion: crd.antrea.io/v1alpha1
 2kind: ClusterNetworkPolicy
 3metadata:
 4  name: override-attempt-failed-allow-yelb
 5spec:
 6  priority: 1
 7  appliedTo:
 8  - podSelector:
 9      matchLabels:
10        app: ubuntu-20-04
11  egress:
12  - action: Allow
13    to:
14      - fqdn: "yelb-ui.yelb.carefor.some-dns.net"
15    ports:
16      - protocol: TCP
17        port: 80
18  - action: Allow
1linuxvm01:~/antrea/policies$ k apply -f fqdn-rule-no-tier.yaml
2clusternetworkpolicy.crd.antrea.io/override-attempt-failed-allow-yelb created
3linuxvm01:~/antrea/policies$ k get acnp
4NAME                                 TIER          PRIORITY   DESIRED NODES   CURRENT NODES   AGE
5acnp-allow-yelb                      application   1          1               1               151m
6acnp-drop-yelb                       securityops   1          1               1               18h
7override-attempt-failed-allow-yelb   application   1          1               1               10s

The rule will be placed in the application Tier, even though the user has permission to create clusternetworkpolicies...

With this the network or security admins have full control of the network policies before and after the application Tier (ref the Tier diagram above).

This example has only shown how to do this on Cluster level, one can also add more granular permission on Namespace level.

So far I have gone over how to manage the Antrea FeatureGates in TKG, how to configure the Antrea-NSX integration, Antrea Policies in general and how to manage RBAC. In the the two next chapters I will cover two different ways how we can apply the Antrea Policies. Lets get into it

How to manage the Antrea Native Policies

As mentioned previously Antrea Native Policies can be applied from inside the Kubernetes cluster using yaml manifests, but there is also another way to manage them using the NSX manager. As not mentioned previously this opens up for a whole new way of managing security policies. Centrally managed across multiple clusters wherever located, easier adoption of roles and responsibilities. If NSX is already in place, chances are that NSX security policies are already in place and being managed by the network or security admins. Now they can continue doing that but also take into consideration pod network security across the different TKG/Kubernetes clusters.

Antrea Security policies from the NSX manager

After you have connected your TKG clusters to the NSX manager (as shown earlier in this post) you will see the status of these connections in the NSX manager under System -> Fabric -> Nodes:

nsx-container-clusters

The status indicator is also a benefit of this integration as it will show you the status of Antrea Controller, and the components responsible for the Antrea-NSX integration.

Under inventory we can get all the relevant info from the TKG clusters:

inventory

Where in the screenshot above stc-tkg-cluster 1 and 2 are my TKG Antrea clusters. I can get all kinds of information like namespaces, pods, labels, ip addresses, names, services. This informaton is relevant as I can use them in my policy creation, but it also gives me status on whether pods, services are up.

pods

labels

services

Antrea Cluster Network Policies - Applied from the NSX manager

With the NSX manager we can create and manage the Antrea Native Policies from the NSX graphical user interface instead of CLI. Using NSX security groups and labels make it also much more fun, but also very easy to maintain know what we do as we can see the policies.

Lets create some policies from the NSX manager microsegmenting my demo application Yelb. This is my demo application, it consists of four pods, and a service called yelb-ui where the webpage is exposed.

yelb

I know the different parts of the application (e.g pods) are using labels so I will use them. First let us list them from cli and then get them from the NSX manager.

1linuxvm01:~/antrea/policies$ k get pods -n yelb --show-labels
2NAME                              READY   STATUS    RESTARTS   AGE   LABELS
3redis-server-69846b4888-5m757     1/1     Running   0          22h   app=redis-server,pod-template-hash=69846b4888,tier=cache
4yelb-appserver-857c5c76d5-4cgbq   1/1     Running   0          22h   app=yelb-appserver,pod-template-hash=857c5c76d5,tier=middletier
5yelb-db-6bd4fc5d9b-92rkf          1/1     Running   0          22h   app=yelb-db,pod-template-hash=6bd4fc5d9b,tier=backenddb
6yelb-ui-6df49457d6-4bktw          1/1     Running   0          20h   app=yelb-ui,pod-template-hash=6df49457d6,tier=frontend

Ok, there I have the labels. Fine, just for the sake of it I will find the same labels in the NSX manager also:

nsx-pod-labels

Now I need to create some security groups in NSX using these labels.

First group is called acnp-yelb-frontend-ui and are using these membership criterias: (I am also adding the namespace criteria, to exclude any other applications using the same labels in other namespaces).

membership-criteria

acnp-yelb-ui

Now hurry back to the security group and check whether there are any members.... Disappointment. Just empty:

membership

Fear not, let us quickly create a policy with this group:

Create a new policy and set Antrea Container Clusters in the applied to field:

policy

applied-to

The actual rule:

rule

The rule above allows my AVI Service Engines to reach the web-port on my yelb-ui pod on port 80 (http) as they are the loadbalancer for my application.

Any members in the group now?

members...

Yes ๐Ÿ˜ƒ

Now go ahead and create similar groups and rules (except the ports) for the other pods using their respective label.

End result:

microseg-yelb

Do they work? Let us find that out a bit later as I need something to put in my TraceFlow chapter ๐Ÿ˜„

The rules I have added above was just for the application in the namespace Yelb. If I wanted to extend this ruleset to also include the same application from other clusters its just adding the Kubernetes cluster in the Applied field like this:

add-clusters

NSX Distributed Firewall - Kubernetes objects Policies

In additon to managing the Antrea Native Policies from the NSX manager as above, in the recent NSX release additional features have been added to support security policies enforced in the NSX Distributed Firewall to also cover these components:

nsx-dfw-kubernetes

With this we can create security policies in NSX using the distributed firewall to cover the above components using security groups. With this feature its no longer necessary to investigate to get the information about the above components as they are already reported into the NSX manager. Let is do an example of how such a rule can be created and work.

I will create a security policy based on this feature where I will use Kubernetes Service in my example. I will create a security group as above, but this time I will do some different selections. First grab the labels from the service, I will use the yelb-ui service in my example:

1linuxvm01:~/antrea/policies$ k get svc -n yelb --show-labels
2NAME             TYPE           CLUSTER-IP      EXTERNAL-IP    PORT(S)        AGE   LABELS
3redis-server     ClusterIP      20.10.102.8     <none>         6379/TCP       23h   app=redis-server,tier=cache
4yelb-appserver   ClusterIP      20.10.247.130   <none>         4567/TCP       23h   app=yelb-appserver,tier=middletier
5yelb-db          ClusterIP      20.10.44.17     <none>         5432/TCP       23h   app=yelb-db,tier=backenddb
6yelb-ui          LoadBalancer   20.10.194.179   10.13.210.10   80:30912/TCP   21h   app=yelb-ui,tier=frontend

I can either decide to use app=yelb-ui or tier=frontend. Now that I have my labels I will create my security group like this:

group-k8s-service

membership-criteria

I used the name of the service itself and the name of the namespace. This gives me this member:

members

Which is right...

Now create a security policy using this group where source is another group where I have defined a VM running in the same NSX environment. I have also created a any group which contains just 0.0.0.0/0. Remember that this policy is enforced in the DFW, so there must be something that is running in NSX for this to work, which in my environments is not only the the TKG cluster, but also the Avi Service Engines which acts as LoadBalancer and Ingress for my exposed services. This is kind of important to think of, as the Avi Service Engines communicates with the TKG cluster nodes using NodePortLocal in the default portrange 61000-62000 (if not changed in the Antrea configmap).

Lets see if the below rule works then:

kubernetes-service-rule

I will adjust it to Action Drop:

drop

Test Yelb ui access from my linux vm via curl and my physical laptop's browser, and the results are in:

1ubuntu02:~$ curl yelb-ui.yelb.carefor.some-dns.net
2curl: (28) Failed to connect to yelb-ui.yelb.carefor.some-dns.net port 80: Connection timed out

From my physical laptop browser:

laptop-yelb-ui

This will be dropped even though I still have these rules in place from earlier (remember):

antrea-policies

Now, what about the Avi Service Engines?

If we just look at the rules above, the NSX Kubernetes Service rule and the Antrea Policies rules we are doing the firewall enforcing at two different levels. When creating policies with the Antrea Native Policies, like the one just above, we are applying and enforcing inside the Kubernetes cluster, with the NSX Kubernetes Service rule we are applying and enforcing on the DFW layer. So the Avi Service Engines will first need a policy that is allowing them to communicate to the TKG worker nodes on specific ports/protocol, in my exampe above with Yelb it is port 61002 and TCP. We can see that by looking in Avi UI:

avi-se-connectivity

Regardless of the Avi SE's are using the same DFW as the worker nodes, we need to create this policy for the SE to reach the worker nodes to allow this connection. These policies can either be very "lazy" allowing the SEs on everyting TCP with a range of 61000-62000 to the worker nodes or can be made very granual pr service. The Avi SEs are automatically being grouped in NSX security groups if using Avi with NSX Cloud, explore that.

If we are not allowing the SEs this traffic, we will se this in the Avi UI:

avi-se-blocked

Why is that though, I dont have a default block-all rule in my NSX environment... Well this is because of a set of default rules being created by NCP from TKG. Have a look at this rule:

ncp-rules

What is the membership in the group used in this Drop rule?

members

That is all my TKG nodes including the Supervisor Control Plane nodes (the workload interface).

Now in the Antrea Policies, we need to allow the IP addresses the SEs are using to reach the yelb-ui, as its not the actual client-ip that is being used, it is the SEs dataplane network.

traffic-walk

The above diagram tries to explain the traffic flow and how it will be enforced. First the user want to access the VIP of the Yelb UI service. This is allowed by the NSX Firewall saying, yes Port 80 on IP 10.13.210.10 is OK to pass. As this VIP is realized by the Avi SEs, and are on NSX this rule will be enforced by the NSX firewall. Then the Avi SEs will forward (loadbalance) the traffic to the worker node(s) using NodePortLocal ranges between 61000-62000(default) where the worker nodes are also on the same NSX DFW, so we need to allow the SEs to forward this traffic. When all above is allowed, we will get "into" the actual TKG (Kubernetes) cluster and need to negiotiate the Antrea Native Policies that have been applied. These rules remember are allowing the SE dataplane IPs to reach the pod yelb-ui on port 80. And thats it.

Just before we end up this chapter and head over to the next, let us quickly see how the policies created from the NSX manager look like inside the TKG cluster:

1linuxvm01:~/antrea/policies$ k get acnp
2NAME                                   TIER                          PRIORITY             DESIRED NODES   CURRENT NODES   AGE
3823fca6f-88ee-4032-8150-ac8cf22f1c93   nsx-category-infrastructure   1.000000017763571    3               3               23h
49ae2599a-3bd3-4413-849e-06f53f467559   nsx-category-application      1.0000000532916369   2               2               24h

The policies will be placed according to the NSX tiers from the UI:

nsx-dfw-categories

If I describe one of the policies I will get the actual yaml manifest:

 1linuxvm01:~/antrea/policies$ k get acnp 9ae2599a-3bd3-4413-849e-06f53f467559 -oyaml
 2apiVersion: crd.antrea.io/v1alpha1
 3kind: ClusterNetworkPolicy
 4metadata:
 5  annotations:
 6    ccp-adapter.antrea.tanzu.vmware.com/display-name: Yelb-Zero-Trust
 7  creationTimestamp: "2023-06-05T12:12:14Z"
 8  generation: 6
 9  labels:
10    ccp-adapter.antrea.tanzu.vmware.com/managedBy: ccp-adapter
11  name: 9ae2599a-3bd3-4413-849e-06f53f467559
12  resourceVersion: "404591"
13  uid: 6477e785-fde4-46ba-b0a1-5ff5f784db8c
14spec:
15  ingress:
16  - action: Allow
17    appliedTo:
18    - group: 6f39fadf-04e8-4f49-be77-da0d4005ff37
19    enableLogging: false
20    from:
21    - ipBlock:
22        cidr: 10.13.11.101/32
23    - ipBlock:
24        cidr: 10.13.11.100/32
25    name: "4084"
26    ports:
27    - port: 80
28      protocol: TCP
29  - action: Allow
30    appliedTo:
31    - group: 31cf5eab-8bcd-4305-b72d-f1a44843fd8e
32    enableLogging: false
33    from:
34    - group: 6f39fadf-04e8-4f49-be77-da0d4005ff37
35    name: "4085"
36    ports:
37    - port: 4567
38      protocol: TCP
39  - action: Allow
40    appliedTo:
41    - group: 672f4d75-c83b-4fa1-b0ab-ae414c2e8e8c
42    enableLogging: false
43    from:
44    - group: 31cf5eab-8bcd-4305-b72d-f1a44843fd8e
45    name: "4087"
46    ports:
47    - port: 5432
48      protocol: TCP
49  - action: Allow
50    appliedTo:
51    - group: 52c3548b-4758-427f-bcde-b25d36613de6
52    enableLogging: false
53    from:
54    - group: 31cf5eab-8bcd-4305-b72d-f1a44843fd8e
55    name: "4088"
56    ports:
57    - port: 6379
58      protocol: TCP
59  - action: Drop
60    appliedTo:
61    - group: d250b7d7-3041-4f7f-8fdf-c7360eee9615
62    enableLogging: false
63    from:
64    - group: d250b7d7-3041-4f7f-8fdf-c7360eee9615
65    name: "4089"
66  priority: 1.0000000532916369
67  tier: nsx-category-application
68status:
69  currentNodesRealized: 2
70  desiredNodesRealized: 2
71  observedGeneration: 6
72  phase: Realized

Antrea Security policies from kubernetes api

I have already covered this topic in another post here. Head over and have look, also its worth reading the official documentation page from Antrea here as it contains examples and is updated on new features.

One thing I would like to use this chapter for though is trying to apply a policy on the NSX added Tiers when doing the integration (explained above). Remember the Tiers?

 1linuxvm01:~/antrea/policies$ k get tiers
 2NAME                          PRIORITY   AGE
 3application                   250        2d2h
 4baseline                      253        2d2h
 5emergency                     50         2d2h
 6networkops                    150        2d2h
 7nsx-category-application      4          2d
 8nsx-category-emergency        1          2d
 9nsx-category-environment      3          2d
10nsx-category-ethernet         0          2d
11nsx-category-infrastructure   2          2d
12platform                      200        2d2h
13securityops                   100        2d2h

These nsx* tiers are coming from the NSX manager, but can I as a cluster-owner/editor place rules in here by default? If you look at the PRIORITY of these, they are pretty low.

Let us apply the same rule as used earlier in this post, by just editing in the tier placement:

 1apiVersion: crd.antrea.io/v1alpha1
 2kind: ClusterNetworkPolicy
 3metadata:
 4  name: acnp-nsx-tier-from-kubectl
 5spec:
 6  priority: 1
 7  tier: nsx-category-environment
 8  appliedTo:
 9  - podSelector:
10      matchLabels:
11        app: ubuntu-20-04
12  egress:
13  - action: Allow
14    to:
15      - fqdn: "yelb-ui.yelb.carefor.some-dns.net"
16    ports:
17      - protocol: TCP
18        port: 80
19  - action: Allow
1linuxvm01:~/antrea/policies$ k apply -f fqdn-rule-nsx-tier.yaml
2Error from server: error when creating "fqdn-rule-nsx-tier.yaml": admission webhook "acnpvalidator.antrea.io" denied the request: user not authorized to access Tier nsx-category-environment

Even though I am the cluster-owner/admin/superuser I am not allowed to place any rules in these nsx tiers. So this just gives us further control and mechanisms to support both NSX created Antrea policies and Antrea policies from kubectl. This allows for a good control of security enforcement by roles in the organization.

Antrea Dashboard

As the Octant dashboard is no more, Antrea now has its own dashboard. Its very easy to deploy. Let me quickly go through it. Read more about it here

1# Add the helm charts
2helm repo add antrea https://charts.antrea.io
3helm repo update

Install it:

1helm install antrea-ui antrea/antrea-ui --namespace kube-system
 1linuxvm01:~/antrea/policies$ helm repo add antrea https://charts.antrea.io
 2"antrea" has been added to your repositories
 3linuxvm01:~/antrea/policies$ helm repo update
 4Hang tight while we grab the latest from your chart repositories...
 5...Successfully got an update from the "ako" chart repository
 6...Successfully got an update from the "antrea" chart repository
 7...Successfully got an update from the "bitnami" chart repository
 8Update Complete. โŽˆHappy Helming!โŽˆ
 9linuxvm01:~/antrea/policies$ helm install antrea-ui antrea/antrea-ui --namespace kube-system
10NAME: antrea-ui
11LAST DEPLOYED: Tue Jun  6 12:56:21 2023
12NAMESPACE: kube-system
13STATUS: deployed
14REVISION: 1
15TEST SUITE: None
16NOTES:
17The Antrea UI has been successfully installed
18
19You are using version 0.1.1
20
21To access the UI, forward a local port to the antrea-ui service, and connect to
22that port locally with your browser:
23
24  $ kubectl -n kube-system port-forward service/antrea-ui 3000:3000
25
26After running the command above, access "http://localhost:3000" in your browser.For the Antrea documentation, please visit https://antrea.io

This will spin up a new pod, and a clusterip service.

1linuxvm01:~/antrea/policies$ k get pods -n kube-system
2NAME                                                                    READY   STATUS    RESTARTS   AGE
3antrea-agent-9rvqc                                                      2/2     Running   0          2d16h
4antrea-agent-m7rg7                                                      2/2     Running   0          2d16h
5antrea-agent-wvpp8                                                      2/2     Running   0          2d16h
6antrea-controller-6d56b6d664-vlmh2                                      1/1     Running   0          2d16h
7antrea-ui-9c89486f4-msw6m                                               2/2     Running   0          62s
1linuxvm01:~/antrea/policies$ k get svc -n kube-system
2NAME             TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                  AGE
3antrea           ClusterIP   20.10.96.45     <none>        443/TCP                  2d16h
4antrea-ui        ClusterIP   20.10.228.144   <none>        3000/TCP                 95s

Now instead of exposing the service as nodeport, I am just creating a serviceType loadBalancer for it like this:

 1apiVersion: v1
 2kind: Service
 3metadata:
 4  name: antrea-dashboard-ui
 5  labels:
 6    app: antrea-ui
 7  namespace: kube-system
 8spec:
 9  loadBalancerClass: ako.vmware.com/avi-lb
10  type: LoadBalancer
11  ports:
12  - port: 80
13    protocol: TCP
14    targetPort: 3000
15  selector:
16    app: antrea-ui

Apply it:

1linuxvm01:~/antrea$ k apply -f antrea-dashboard-lb-yaml
2service/antrea-dashboard-ui created
3linuxvm01:~/antrea$ k get svc -n kube-system
4NAME                  TYPE           CLUSTER-IP      EXTERNAL-IP    PORT(S)                  AGE
5antrea                ClusterIP      20.10.96.45     <none>         443/TCP                  2d16h
6antrea-dashboard-ui   LoadBalancer   20.10.76.243    10.13.210.12   80:31334/TCP             7s
7antrea-ui             ClusterIP      20.10.228.144   <none>         3000/TCP                 8m47s

Now access it through my browser:

antrea-dashboard

Default password is admin

Good overview:

overview

The option to do Traceflow:

traceflow

Oops, dropped by a NetworkPolicy... Where does that come from ๐Ÿค” ... More on this later.

Antrea Network Monitoring

Being able to know what's going on is crucial when planning security policies, but also to know if the policies are working and being enforced. With that information available we can know if we are compliant with the policies apllied. Without any network flow information we are kind of in the blind. Luckily Antrea is fully capable of report full flow information, and export it. To be able to export the flow information we need to enable the FeatureGate FlowExporter:

 1apiVersion: cni.tanzu.vmware.com/v1alpha1
 2kind: AntreaConfig
 3metadata:
 4  name: stc-tkg-cluster-1-antrea-package
 5  namespace: ns-stc-1
 6spec:
 7  antrea:
 8    config:
 9      featureGates:
10        AntreaProxy: true
11        EndpointSlice: false
12        AntreaPolicy: true
13        FlowExporter: true #This needs to be enabled
14        Egress: true
15        NodePortLocal: true
16        AntreaTraceflow: true
17        NetworkPolicyStats: true

Flow-Exporter - IPFIX

From the offical Antrea documentation:

Antrea is a Kubernetes network plugin that provides network connectivity and security features for Pod workloads. Considering the scale and dynamism of Kubernetes workloads in a cluster, Network Flow Visibility helps in the management and configuration of Kubernetes resources such as Network Policy, Services, Pods etc., and thereby provides opportunities to enhance the performance and security aspects of Pod workloads.

For visualizing the network flows, Antrea monitors the flows in Linux conntrack module. These flows are converted to flow records, and then flow records are post-processed before they are sent to the configured external flow collector. High-level design is given below:

flow-aggregator

From the Antrea official documentation again:

Flow Exporter

In Antrea, the basic building block for the Network Flow Visibility is the Flow Exporter. Flow Exporter operates within Antrea Agent; it builds and maintains a connection store by polling and dumping flows from conntrack module periodically. Connections from the connection store are exported to the Flow Aggregator Service using the IPFIX protocol, and for this purpose we use the IPFIX exporter process from the go-ipfix library.

Read more Network Flow Visibility in Antrea here.

Traceflow

When troubleshooting network issues or even firewall rules (is my traffic being blocked or allowed?) it is very handy to have the option to do Traceflow. Antrea supports Traceflow. To be able to use Traceflow, the AntreaTraceFlow FeatureGate needs to be enabled if not already.

 1apiVersion: cni.tanzu.vmware.com/v1alpha1
 2kind: AntreaConfig
 3metadata:
 4  name: stc-tkg-cluster-1-antrea-package
 5  namespace: ns-stc-1
 6spec:
 7  antrea:
 8    config:
 9      featureGates:
10        AntreaProxy: true
11        EndpointSlice: false
12        AntreaPolicy: true
13        FlowExporter: true
14        Egress: true
15        NodePortLocal: true
16        AntreaTraceflow: true #This needs to be enabled
17        NetworkPolicyStats: true

Now that it is enabled, how can we perform Traceflow?

We can do Traceflow using kubectl, Antrea UI or even from the NSX manager if using the NSX/Antrea integration.

Traceflow in Antrea supports the following:

  • Source: pod, protocol (TCP/UDP/ICMP) and port numbers
  • Destination: pod, service, ip, protocol (TCP/UDP/ICMP) and port numbers
  • One time Traceflow or live

Now to get back to my Antrea policies created earlier I want to test if they are actually being in use and enforced. So let me do a Traceflow form my famous Yelb-ui pod and see if it can reach the application pod on its allowed port. Remember that the UI pod needed to communicate with the appserver pod on TCP 4567 and that I created a rule that only allows this, all else is blocked.

If I want to do Traceflow from kubectl, this is an example to test if port 4567 is allowed from ui pod to appserver pod:

 1apiVersion: crd.antrea.io/v1alpha1
 2kind: Traceflow
 3metadata:
 4  name: tf-test
 5spec:
 6  source:
 7    namespace: yelb
 8    pod: yelb-ui-6df49457d6-m5clv
 9  destination:
10    namespace: yelb
11    pod: yelb-appserver-857c5c76d5-4cd86
12    # destination can also be an IP address ('ip' field) or a Service name ('service' field); the 3 choices are mutually exclusive.
13  packet:
14    ipHeader: # If ipHeader/ipv6Header is not set, the default value is IPv4+ICMP.
15      protocol: 6 # Protocol here can be 6 (TCP), 17 (UDP) or 1 (ICMP), default value is 1 (ICMP)
16    transportHeader:
17      tcp:
18        srcPort: 0 # Source port needs to be set when Protocol is TCP/UDP.
19        dstPort: 4567 # Destination port needs to be set when Protocol is TCP/UDP.
20        flags: 2 # Construct a SYN packet: 2 is also the default value when the flags field is omitted.

Now apply it and get the output:

1linuxvm01:~/antrea/policies$ k apply -f traceflow.yaml
2traceflow.crd.antrea.io/tf-test created
 1linuxvm01:~/antrea/policies$ k get traceflows.crd.antrea.io -n yelb tf-test -oyaml
 2apiVersion: crd.antrea.io/v1alpha1
 3kind: Traceflow
 4metadata:
 5  annotations:
 6    kubectl.kubernetes.io/last-applied-configuration: |
 7      {"apiVersion":"crd.antrea.io/v1alpha1","kind":"Traceflow","metadata":{"annotations":{},"name":"tf-test"},"spec":{"destination":{"namespace":"yelb","pod":"yelb-appserver-857c5c76d5-4cd86"},"packet":{"ipHeader":{"protocol":6},"transportHeader":{"tcp":{"dstPort":4567,"flags":2,"srcPort":0}}},"source":{"namespace":"yelb","pod":"yelb-ui-6df49457d6-m5clv"}}}      
 8  creationTimestamp: "2023-06-07T12:47:14Z"
 9  generation: 1
10  name: tf-test
11  resourceVersion: "904386"
12  uid: c550596b-ed43-4bab-a6f1-d23e90d35f84
13spec:
14  destination:
15    namespace: yelb
16    pod: yelb-appserver-857c5c76d5-4cd86
17  packet:
18    ipHeader:
19      protocol: 6
20    transportHeader:
21      tcp:
22        dstPort: 4567
23        flags: 2
24        srcPort: 0
25  source:
26    namespace: yelb
27    pod: yelb-ui-6df49457d6-m5clv
28status:
29  phase: Succeeded
30  results:
31  - node: stc-tkg-cluster-1-node-pool-01-p6nms-84c55d4574-5r8gj
32    observations:
33    - action: Received
34      component: Forwarding
35    - action: Forwarded
36      component: NetworkPolicy
37      componentInfo: IngressRule
38      networkPolicy: AntreaClusterNetworkPolicy:9ae2599a-3bd3-4413-849e-06f53f467559
39    - action: Delivered
40      component: Forwarding
41      componentInfo: Output
42    timestamp: 1686142036
43  - node: stc-tkg-cluster-1-node-pool-01-p6nms-84c55d4574-bpx7s
44    observations:
45    - action: Forwarded
46      component: SpoofGuard
47    - action: Forwarded
48      component: Forwarding
49      componentInfo: Output
50      tunnelDstIP: 10.13.82.39
51    timestamp: 1686142036
52  startTime: "2023-06-07T12:47:14Z"

That was a success. - action: Forwarded

Now I want to run it again but with another port. So I change the above yaml to use port 4568 (which should not be allowed):

 1linuxvm01:~/antrea/policies$ k get traceflows.crd.antrea.io -n yelb tf-test -oyaml
 2apiVersion: crd.antrea.io/v1alpha1
 3kind: Traceflow
 4metadata:
 5  annotations:
 6    kubectl.kubernetes.io/last-applied-configuration: |
 7      {"apiVersion":"crd.antrea.io/v1alpha1","kind":"Traceflow","metadata":{"annotations":{},"name":"tf-test"},"spec":{"destination":{"namespace":"yelb","pod":"yelb-appserver-857c5c76d5-4cd86"},"packet":{"ipHeader":{"protocol":6},"transportHeader":{"tcp":{"dstPort":4568,"flags":2,"srcPort":0}}},"source":{"namespace":"yelb","pod":"yelb-ui-6df49457d6-m5clv"}}}      
 8  creationTimestamp: "2023-06-07T12:53:59Z"
 9  generation: 1
10  name: tf-test
11  resourceVersion: "905571"
12  uid: d76ec419-3272-4595-98a5-72a49adce9d3
13spec:
14  destination:
15    namespace: yelb
16    pod: yelb-appserver-857c5c76d5-4cd86
17  packet:
18    ipHeader:
19      protocol: 6
20    transportHeader:
21      tcp:
22        dstPort: 4568
23        flags: 2
24        srcPort: 0
25  source:
26    namespace: yelb
27    pod: yelb-ui-6df49457d6-m5clv
28status:
29  phase: Succeeded
30  results:
31  - node: stc-tkg-cluster-1-node-pool-01-p6nms-84c55d4574-bpx7s
32    observations:
33    - action: Forwarded
34      component: SpoofGuard
35    - action: Forwarded
36      component: Forwarding
37      componentInfo: Output
38      tunnelDstIP: 10.13.82.39
39    timestamp: 1686142441
40  - node: stc-tkg-cluster-1-node-pool-01-p6nms-84c55d4574-5r8gj
41    observations:
42    - action: Received
43      component: Forwarding
44    - action: Dropped
45      component: NetworkPolicy
46      componentInfo: IngressMetric
47      networkPolicy: AntreaClusterNetworkPolicy:9ae2599a-3bd3-4413-849e-06f53f467559
48    timestamp: 1686142441
49  startTime: "2023-06-07T12:53:59Z"

That was also a success, as it was dropped by design: - action: Dropped

Its great being able to do this from kubectl, if one quickly need to check this before starting to look somewhere else and create a support ticket ๐Ÿ˜ƒ or one dont have access to other tools like the Antrea UI or even the NSX manager, speaking of NSX manager. Let us do the exact same trace from the NSX manager gui:

Head over Plan&Troubleshoot -> Traffic Analysis:

nsx-analysis

Results:

nsx-result-allowed

Now I change it to another port again and test it again:

nsx-result-dropped

Dropped again.

The same procedure can also be done from the Antrea UI as shown above, now with a port that is allowed:

image-20230607150201116

To read more on Traceflow in Antrea, head over here.

Theia

Now that we have know it's possible to export all flows using IPFIX, I thought it would be interesting to just showcase how the flow information can be presented with a solution called Theia. From the official docs:

Theia is a network observability and analytics platform for Kubernetes. It is built on top of Antrea, and consumes network flows exported by Antrea to provide fine-grained visibility into the communication and NetworkPolicies among Pods and Services in a Kubernetes cluster.

To install Theia I have followed the instructions from here which is also a greate place to read more about Theia.

Theia is installed using Helm, start by adding the charts, do an update and deploy:

1linuxvm01:~/antrea$ helm repo add antrea https://charts.antrea.io
2"antrea" already exists with the same configuration, skipping
3linuxvm01:~/antrea$ helm repo update
4Hang tight while we grab the latest from your chart repositories...
5...Successfully got an update from the "antrea" chart repository
6Update Complete. โŽˆHappy Helming!โŽˆ

Make sure that FlowExporter has been enabled, if not apply an AntreaConfig that enables it:

 1apiVersion: cni.tanzu.vmware.com/v1alpha1
 2kind: AntreaConfig
 3metadata:
 4  name: stc-tkg-cluster-1-antrea-package
 5  namespace: ns-stc-1
 6spec:
 7  antrea:
 8    config:
 9      featureGates:
10        AntreaProxy: true
11        EndpointSlice: false
12        AntreaPolicy: true
13        FlowExporter: true #Enable this!
14        Egress: true
15        NodePortLocal: true
16        AntreaTraceflow: true
17        NetworkPolicyStats: true

After the config has been enabled, delete the Antrea agents and controller so these will read the new configMap:

1linuxvm01:~/antrea/theia$ k delete pod -n kube-system -l app=antrea
2pod "antrea-agent-58nn2" deleted
3pod "antrea-agent-cnq9p" deleted
4pod "antrea-agent-sx6vr" deleted
5pod "antrea-controller-6d56b6d664-km64t" deleted

After the Helm charts have been added, I start by installing the Flow Aggregator

1helm install flow-aggregator antrea/flow-aggregator --set clickHouse.enable=true,recordContents.podLabels=true -n flow-aggregator --create-namespace

As usual with Helm charts, if there is any specific settings you would like to change get the helm chart values for your specific charts first and refer to them by using -f values.yaml..

1linuxvm01:~/antrea/theia$ helm show values antrea/flow-aggregator > flow-agg-values.yaml

I dont have any specifics I need to change for this one, so I will just deploy using the defaults:

 1linuxvm01:~/antrea/theia$ helm install flow-aggregator antrea/flow-aggregator --set clickHouse.enable=true,recordContents.podLabels=true -n flow-aggregator --create-namespace
 2NAME: flow-aggregator
 3LAST DEPLOYED: Tue Jun  6 21:28:49 2023
 4NAMESPACE: flow-aggregator
 5STATUS: deployed
 6REVISION: 1
 7TEST SUITE: None
 8NOTES:
 9The Antrea Flow Aggregator has been successfully installed
10
11You are using version 1.12.0
12
13For the Antrea documentation, please visit https://antrea.io

Now what has happened in my TKG cluster:

 1linuxvm01:~/antrea/theia$ k get pods -n flow-aggregator
 2NAME                               READY   STATUS    RESTARTS      AGE
 3flow-aggregator-5b4c69885f-mklm5   1/1     Running   1 (10s ago)   22s
 4linuxvm01:~/antrea/theia$ k get pods -n flow-aggregator
 5NAME                               READY   STATUS    RESTARTS      AGE
 6flow-aggregator-5b4c69885f-mklm5   1/1     Running   1 (13s ago)   25s
 7linuxvm01:~/antrea/theia$ k get pods -n flow-aggregator
 8NAME                               READY   STATUS   RESTARTS      AGE
 9flow-aggregator-5b4c69885f-mklm5   0/1     Error    1 (14s ago)   26s
10linuxvm01:~/antrea/theia$ k get pods -n flow-aggregator
11NAME                               READY   STATUS   RESTARTS      AGE
12flow-aggregator-5b4c69885f-mklm5   0/1     CrashLoopBackOff    3 (50s ago)   60s

Well, that did'nt go so well...

The issue is that Flow Aggregator is looking for a service that is not created yet and will just fail until this is deployed. This is our next step.

 1linuxvm01:~/antrea/theia$ helm install theia antrea/theia --set sparkOperator.enable=true,theiaManager.enable=true -n flow-visibility --create-namespace
 2
 3NAME: theia
 4LAST DEPLOYED: Tue Jun  6 22:02:37 2023
 5NAMESPACE: flow-visibility
 6STATUS: deployed
 7REVISION: 1
 8TEST SUITE: None
 9NOTES:
10Theia has been successfully installed
11
12You are using version 0.6.0
13
14For the Antrea documentation, please visit https://antrea.io

What has been created now:

1linuxvm01:~/antrea/theia$ k get pods -n flow-visibility
2NAME                                    READY   STATUS    RESTARTS   AGE
3chi-clickhouse-clickhouse-0-0-0         2/2     Running   0          8m52s
4grafana-684d8948b-c6wzn                 1/1     Running   0          8m56s
5theia-manager-5d8d6b86b7-cbxrz          1/1     Running   0          8m56s
6theia-spark-operator-54d9ddd544-nqhqd   1/1     Running   0          8m56s
7zookeeper-0                             1/1     Running   0          8m56s

Now flow-aggreator should also be in a runing state, if not just delete the pod and it should get back on its feet.

1linuxvm01:~/antrea/theia$ k get pods -n flow-aggregator
2NAME                               READY   STATUS    RESTARTS   AGE
3flow-aggregator-5b4c69885f-xhdkx   1/1     Running   0          5m2s

So, now its all about getting access to the Grafana dashboard. I will just expose this with serviceType loadBalancer as it "out-of-the-box" is only exposed with NodePort:

1linuxvm01:~/antrea/theia$ k get svc -n flow-visibility
2NAME                            TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE
3chi-clickhouse-clickhouse-0-0   ClusterIP   None            <none>        8123/TCP,9000/TCP,9009/TCP   8m43s
4clickhouse-clickhouse           ClusterIP   20.10.136.211   <none>        8123/TCP,9000/TCP            10m
5grafana                         NodePort    20.10.172.165   <none>        3000:30096/TCP               10m
6theia-manager                   ClusterIP   20.10.156.217   <none>        11347/TCP                    10m
7zookeeper                       ClusterIP   20.10.219.137   <none>        2181/TCP,7000/TCP            10m
8zookeepers                      ClusterIP   None            <none>        2888/TCP,3888/TCP            10m

So let us create a LoadBalancer service for this:

 1apiVersion: v1
 2kind: Service
 3metadata:
 4  name: theia-dashboard-ui
 5  labels:
 6    app: grafana
 7  namespace: flow-visibility
 8spec:
 9  loadBalancerClass: ako.vmware.com/avi-lb
10  type: LoadBalancer
11  ports:
12  - port: 80
13    protocol: TCP
14    targetPort: 3000
15  selector:
16    app: grafana
1linuxvm01:~/antrea/theia$ k get svc -n flow-visibility
2NAME                            TYPE           CLUSTER-IP      EXTERNAL-IP    PORT(S)                      grafana                         NodePort       20.10.172.165   <none>         3000:30096/TCP               15m
3theia-dashboard-ui              LoadBalancer   20.10.24.174    10.13.210.13   80:32075/TCP                 13s

Lets try to access it through the browser:

grafana

Great. Theia comes with a couple of predefined dashobards that is interesting to start out with. So let me list some of the screenshots from the predefined dashboards below:

The homepage:

homepage

List of dashboards:

list

Flow_Records_Dashboard:

flow-records

Network_Topology_Dashboard:

network-toplogy

Network Policy Recommendation

From the official docs:

Theia NetworkPolicy Recommendation recommends the NetworkPolicy configuration to secure Kubernetes network and applications. It analyzes the network flows collected by Grafana Flow Collector to generate Kubernetes NetworkPolicies or Antrea NetworkPolicies. This feature assists cluster administrators and app developers in securing their applications according to Zero Trust principles.

I like the sound of that. Let us try it out.

The first I need to install inst the Theia CLI, this can be found and the instructions from here

Theia CLI

1curl -Lo ./theia "https://github.com/antrea-io/theia/releases/download/v0.6.0/theia-$(uname)-x86_64"
2chmod +x ./theia
3mv ./theia /usr/local/bin/theia
4theia help
 1linuxvm01:~/antrea/theia$ curl -Lo ./theia "https://github.com/antrea-io/theia/releases/download/v0.6.0/theia-$(uname)-x86_64"
 2  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
 3                                 Dload  Upload   Total   Spent    Left  Speed
 4  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
 5100 37.9M  100 37.9M    0     0  11.6M      0  0:00:03  0:00:03 --:--:-- 17.2M
 6linuxvm01:~/antrea/theia$ chmod +x ./theia
 7linuxvm01:~/antrea/theia$ sudo cp theia /usr/local/bin/theia
 8linuxvm01:~/antrea/theia$ theia help
 9theia is the command line tool for Theia which provides access
10to Theia network flow visibility capabilities
11
12Usage:
13  theia [command]
14
15Available Commands:
16  clickhouse                   Commands of Theia ClickHouse feature
17  completion                   Generate the autocompletion script for the specified shell
18  help                         Help about any command
19  policy-recommendation        Commands of Theia policy recommendation feature
20  supportbundle                Generate support bundle
21  throughput-anomaly-detection Commands of Theia throughput anomaly detection feature
22  version                      Show Theia CLI version
23
24Flags:
25  -h, --help                help for theia
26  -k, --kubeconfig string   absolute path to the k8s config file, will use $KUBECONFIG if not specified
27  -v, --verbose int         set verbose level
28
29Use "theia [command] --help" for more information about a command.

These are the following commands when running the policy-recommendation option:

 1andreasm@linuxvm01:~/antrea/theia$ theia policy-recommendation --help
 2Command group of Theia policy recommendation feature.
 3Must specify a subcommand like run, status or retrieve.
 4
 5Usage:
 6  theia policy-recommendation [flags]
 7  theia policy-recommendation [command]
 8
 9Aliases:
10  policy-recommendation, pr
11
12Available Commands:
13  delete      Delete a policy recommendation job
14  list        List all policy recommendation jobs
15  retrieve    Get the recommendation result of a policy recommendation job
16  run         Run a new policy recommendation job
17  status      Check the status of a policy recommendation job
18
19
20Use "theia policy-recommendation [command] --help" for more information about a command.

These are the options for the run command:

 1linuxvm01:~/antrea/theia$ theia policy-recommendation run --help
 2Run a new policy recommendation job.
 3Must finish the deployment of Theia first
 4
 5Usage:
 6  theia policy-recommendation run [flags]
 7
 8Examples:
 9Run a policy recommendation job with default configuration
10$ theia policy-recommendation run
11Run an initial policy recommendation job with policy type anp-deny-applied and limit on last 10k flow records
12$ theia policy-recommendation run --type initial --policy-type anp-deny-applied --limit 10000
13Run an initial policy recommendation job with policy type anp-deny-applied and limit on flow records from 2022-01-01 00:00:00 to 2022-01-31 23:59:59.
14$ theia policy-recommendation run --type initial --policy-type anp-deny-applied --start-time '2022-01-01 00:00:00' --end-time '2022-01-31 23:59:59'
15Run a policy recommendation job with default configuration but doesn't recommend toServices ANPs
16$ theia policy-recommendation run --to-services=false
17
18
19Flags:
20      --driver-core-request string     Specify the CPU request for the driver Pod. Values conform to the Kubernetes resource quantity convention.
21                                       Example values include 0.1, 500m, 1.5, 5, etc. (default "200m")
22      --driver-memory string           Specify the memory request for the driver Pod. Values conform to the Kubernetes resource quantity convention.
23                                       Example values include 512M, 1G, 8G, etc. (default "512M")
24  -e, --end-time string                The end time of the flow records considered for the policy recommendation.
25                                       Format is YYYY-MM-DD hh:mm:ss in UTC timezone. No limit of the end time of flow records by default.
26      --exclude-labels                 Enable this option will exclude automatically generated Pod labels including 'pod-template-hash',
27                                       'controller-revision-hash', 'pod-template-generation' during policy recommendation. (default true)
28      --executor-core-request string   Specify the CPU request for the executor Pod. Values conform to the Kubernetes resource quantity convention.
29                                       Example values include 0.1, 500m, 1.5, 5, etc. (default "200m")
30      --executor-instances int32       Specify the number of executors for the Spark application. Example values include 1, 2, 8, etc. (default 1)
31      --executor-memory string         Specify the memory request for the executor Pod. Values conform to the Kubernetes resource quantity convention.
32                                       Example values include 512M, 1G, 8G, etc. (default "512M")
33  -f, --file string                    The file path where you want to save the result. It can only be used when wait is enabled.
34  -h, --help                           help for run
35  -l, --limit int                      The limit on the number of flow records read from the database. 0 means no limit.
36  -n, --ns-allow-list string           List of default allow Namespaces.
37                                       If no Namespaces provided, Traffic inside Antrea CNI related Namespaces: ['kube-system', 'flow-aggregator',
38                                       'flow-visibility'] will be allowed by default.
39  -p, --policy-type string             Types of generated NetworkPolicy.
40                                       Currently we have 3 generated NetworkPolicy types:
41                                       anp-deny-applied: Recommending allow ANP/ACNP policies, with default deny rules only on Pods which have an allow rule applied.
42                                       anp-deny-all: Recommending allow ANP/ACNP policies, with default deny rules for whole cluster.
43                                       k8s-np: Recommending allow K8s NetworkPolicies. (default "anp-deny-applied")
44  -s, --start-time string              The start time of the flow records considered for the policy recommendation.
45                                       Format is YYYY-MM-DD hh:mm:ss in UTC timezone. No limit of the start time of flow records by default.
46      --to-services                    Use the toServices feature in ANP and recommendation toServices rules for Pod-to-Service flows,
47                                       only works when option is anp-deny-applied or anp-deny-all. (default true)
48  -t, --type string                    {initial|subsequent} Indicates this recommendation is an initial recommendion or a subsequent recommendation job. (default "initial")
49      --wait                           Enable this option will hold and wait the whole policy recommendation job finishes.
50
51Global Flags:
52  -k, --kubeconfig string   absolute path to the k8s config file, will use $KUBECONFIG if not specified
53      --use-cluster-ip      Enable this option will use ClusterIP instead of port forwarding when connecting to the Theia
54                            Manager Service. It can only be used when running in cluster.
55  -v, --verbose int         set verbose level

I will just run the following theia policy-recommendation run --type initial --policy-type anp-deny-applied --limit 10000 to generate some output.

1linuxvm01:~/antrea/theia$ theia policy-recommendation run --type initial --policy-type anp-deny-applied --limit 10000
2Successfully created policy recommendation job with name pr-e81a42e4-013a-4cf6-be43-b1ee48ea9a18

Lets check the status:

1# First I will list all the runs to get the name
2linuxvm01:~/antrea/theia$ theia policy-recommendation list
3CreationTime        CompletionTime Name                                    Status
42023-06-08 07:50:28 N/A            pr-e81a42e4-013a-4cf6-be43-b1ee48ea9a18 SCHEDULED
5# Then I will check the status on the specific run
6linuxvm01:~/antrea/theia$ theia policy-recommendation status pr-e81a42e4-013a-4cf6-be43-b1ee48ea9a18
7Status of this policy recommendation job is SCHEDULED

Seems like I have to wait some, time to grab a coffee.

Just poured my coffee, wanted to check again:

1linuxvm01:~/antrea/theia$ theia policy-recommendation status pr-e81a42e4-013a-4cf6-be43-b1ee48ea9a18
2Status of this policy recommendation job is RUNNING: 0/1 (0%) stages completed

Alright, it is running.

Now time to drink the coffee.

Lets check in on it again:

1linuxvm01:~/antrea/theia$ theia policy-recommendation status pr-e81a42e4-013a-4cf6-be43-b1ee48ea9a18
2Status of this policy recommendation job is COMPLETED

Oh yes, now I am excited which policies it recommends:

  1linuxvm01:~/antrea/theia$ theia policy-recommendation retrieve pr-e81a42e4-013a-4cf6-be43-b1ee48ea9a18
  2apiVersion: crd.antrea.io/v1alpha1
  3kind: ClusterNetworkPolicy
  4metadata:
  5  name: recommend-reject-acnp-9np4b
  6spec:
  7  appliedTo:
  8  - namespaceSelector:
  9      matchLabels:
 10        kubernetes.io/metadata.name: yelb
 11    podSelector:
 12      matchLabels:
 13        app: traffic-generator
 14  egress:
 15  - action: Reject
 16    to:
 17    - podSelector: {}
 18  ingress:
 19  - action: Reject
 20    from:
 21    - podSelector: {}
 22  priority: 5
 23  tier: Baseline
 24---
 25apiVersion: crd.antrea.io/v1alpha1
 26kind: ClusterNetworkPolicy
 27metadata:
 28  name: recommend-reject-acnp-ega4b
 29spec:
 30  appliedTo:
 31  - namespaceSelector:
 32      matchLabels:
 33        kubernetes.io/metadata.name: avi-system
 34    podSelector:
 35      matchLabels:
 36        app.kubernetes.io/instance: ako-1685884771
 37        app.kubernetes.io/name: ako
 38        statefulset.kubernetes.io/pod-name: ako-0
 39  egress:
 40  - action: Reject
 41    to:
 42    - podSelector: {}
 43  ingress:
 44  - action: Reject
 45    from:
 46    - podSelector: {}
 47  priority: 5
 48  tier: Baseline
 49---
 50apiVersion: crd.antrea.io/v1alpha1
 51kind: NetworkPolicy
 52metadata:
 53  name: recommend-allow-anp-nl6re
 54  namespace: yelb
 55spec:
 56  appliedTo:
 57  - podSelector:
 58      matchLabels:
 59        app: traffic-generator
 60  egress:
 61  - action: Allow
 62    ports:
 63    - port: 80
 64      protocol: TCP
 65    to:
 66    - ipBlock:
 67        cidr: 10.13.210.10/32
 68  ingress: []
 69  priority: 5
 70  tier: Application
 71---
 72apiVersion: crd.antrea.io/v1alpha1
 73kind: NetworkPolicy
 74metadata:
 75  name: recommend-allow-anp-2ifjo
 76  namespace: avi-system
 77spec:
 78  appliedTo:
 79  - podSelector:
 80      matchLabels:
 81        app.kubernetes.io/instance: ako-1685884771
 82        app.kubernetes.io/name: ako
 83        statefulset.kubernetes.io/pod-name: ako-0
 84  egress:
 85  - action: Allow
 86    ports:
 87    - port: 443
 88      protocol: TCP
 89    to:
 90    - ipBlock:
 91        cidr: 172.24.3.50/32
 92  ingress: []
 93  priority: 5
 94  tier: Application
 95---
 96apiVersion: crd.antrea.io/v1alpha1
 97kind: ClusterNetworkPolicy
 98metadata:
 99  name: recommend-allow-acnp-kube-system-kaoh6
100spec:
101  appliedTo:
102  - namespaceSelector:
103      matchLabels:
104        kubernetes.io/metadata.name: kube-system
105  egress:
106  - action: Allow
107    to:
108    - podSelector: {}
109  ingress:
110  - action: Allow
111    from:
112    - podSelector: {}
113  priority: 5
114  tier: Platform
115---
116apiVersion: crd.antrea.io/v1alpha1
117kind: ClusterNetworkPolicy
118metadata:
119  name: recommend-allow-acnp-flow-aggregator-dnvhc
120spec:
121  appliedTo:
122  - namespaceSelector:
123      matchLabels:
124        kubernetes.io/metadata.name: flow-aggregator
125  egress:
126  - action: Allow
127    to:
128    - podSelector: {}
129  ingress:
130  - action: Allow
131    from:
132    - podSelector: {}
133  priority: 5
134  tier: Platform
135---
136apiVersion: crd.antrea.io/v1alpha1
137kind: ClusterNetworkPolicy
138metadata:
139  name: recommend-allow-acnp-flow-visibility-sqjwf
140spec:
141  appliedTo:
142  - namespaceSelector:
143      matchLabels:
144        kubernetes.io/metadata.name: flow-visibility
145  egress:
146  - action: Allow
147    to:
148    - podSelector: {}
149  ingress:
150  - action: Allow
151    from:
152    - podSelector: {}
153  priority: 5
154  tier: Platform
155---
156apiVersion: crd.antrea.io/v1alpha1
157kind: ClusterNetworkPolicy
158metadata:
159  name: recommend-reject-acnp-hmjt8
160spec:
161  appliedTo:
162  - namespaceSelector:
163      matchLabels:
164        kubernetes.io/metadata.name: yelb-2
165    podSelector:
166      matchLabels:
167        app: yelb-ui
168        tier: frontend
169  egress:
170  - action: Reject
171    to:
172    - podSelector: {}
173  ingress:
174  - action: Reject
175    from:
176    - podSelector: {}
177  priority: 5
178  tier: Baseline

Ok, well. I appreciate the output, but I would need to do some modifications to it before I would apply it. As my lab is not generating that much traffic, it does not create all the flows needed to generate a better recommendation. For it to generate better recommendations, the flows also needs to be there. My traffic-generator is not doing a good job to achieve this. I will need to generate some more activity for the recommendation engine to get enough flows to consider.

Throughput Anomaly Detection

From the offical docs:

From Theia v0.5, Theia supports Throughput Anomaly Detection. Throughput Anomaly Detection (TAD) is a technique for understanding and reporting the throughput abnormalities in the network traffic. It analyzes the network flows collected by Grafana Flow Collector to report anomalies in the network. TAD uses three algorithms to find the anomalies in network flows such as ARIMA, EWMA, and DBSCAN. These anomaly analyses help the user to find threats if present.

Lets try it out. I already have the dependencies and Theia CLI installed.

What is the different commands available:

 1linuxvm01:~/antrea/theia$ theia throughput-anomaly-detection --help
 2Command group of Theia throughput anomaly detection feature.
 3	Must specify a subcommand like run, list, delete, status or retrieve
 4
 5Usage:
 6  theia throughput-anomaly-detection [flags]
 7  theia throughput-anomaly-detection [command]
 8
 9Aliases:
10  throughput-anomaly-detection, tad
11
12Available Commands:
13  delete      Delete a anomaly detection job
14  list        List all anomaly detection jobs
15  retrieve    Get the result of an anomaly detection job
16  run         throughput anomaly detection using Algo
17  status      Check the status of a anomaly detection job
18
19Flags:
20  -h, --help             help for throughput-anomaly-detection
21      --use-cluster-ip   Enable this option will use ClusterIP instead of port forwarding when connecting to the Theia
22                         Manager Service. It can only be used when running in cluster.
23
24Global Flags:
25  -k, --kubeconfig string   absolute path to the k8s config file, will use $KUBECONFIG if not specified
26  -v, --verbose int         set verbose level
27
28Use "theia throughput-anomaly-detection [command] --help" for more information about a command.
 1linuxvm01:~/antrea/theia$ theia throughput-anomaly-detection run --help
 2throughput anomaly detection using algorithms, currently supported algorithms are EWMA, ARIMA and DBSCAN
 3
 4Usage:
 5  theia throughput-anomaly-detection run [flags]
 6
 7Examples:
 8Run the specific algorithm for throughput anomaly detection
 9	$ theia throughput-anomaly-detection run --algo ARIMA --start-time 2022-01-01T00:00:00 --end-time 2022-01-31T23:59:59
10	Run throughput anomaly detection algorithm of type ARIMA and limit on flow records from '2022-01-01 00:00:00' to '2022-01-31 23:59:59'
11	Please note, algo is a mandatory argument'
12
13Flags:
14  -a, --algo string                    The algorithm used by throughput anomaly detection.
15                                       		Currently supported Algorithms are EWMA, ARIMA and DBSCAN.
16      --driver-core-request string     Specify the CPU request for the driver Pod. Values conform to the Kubernetes resource quantity convention.
17                                       Example values include 0.1, 500m, 1.5, 5, etc. (default "200m")
18      --driver-memory string           Specify the memory request for the driver Pod. Values conform to the Kubernetes resource quantity convention.
19                                       Example values include 512M, 1G, 8G, etc. (default "512M")
20  -e, --end-time string                The end time of the flow records considered for the anomaly detection.
21                                       Format is YYYY-MM-DD hh:mm:ss in UTC timezone. No limit of the end time of flow records by default.
22      --executor-core-request string   Specify the CPU request for the executor Pod. Values conform to the Kubernetes resource quantity convention.
23                                       Example values include 0.1, 500m, 1.5, 5, etc. (default "200m")
24      --executor-instances int32       Specify the number of executors for the Spark application. Example values include 1, 2, 8, etc. (default 1)
25      --executor-memory string         Specify the memory request for the executor Pod. Values conform to the Kubernetes resource quantity convention.
26                                       Example values include 512M, 1G, 8G, etc. (default "512M")
27  -h, --help                           help for run
28  -n, --ns-ignore-list string          List of default drop Namespaces. Use this to ignore traffic from selected namespaces
29                                       If no Namespaces provided, Traffic from all namespaces present in flows table will be allowed by default.
30  -s, --start-time string              The start time of the flow records considered for the anomaly detection.
31                                       Format is YYYY-MM-DD hh:mm:ss in UTC timezone. No limit of the start time of flow records by default.
32
33Global Flags:
34  -k, --kubeconfig string   absolute path to the k8s config file, will use $KUBECONFIG if not specified
35      --use-cluster-ip      Enable this option will use ClusterIP instead of port forwarding when connecting to the Theia
36                            Manager Service. It can only be used when running in cluster.
37  -v, --verbose int         set verbose level

I will use the example above:

1linuxvm01:~/antrea/theia$ theia throughput-anomaly-detection run --algo ARIMA --start-time 2023-06-06T00:00:00 --end-time 2023-06-08T09:00:00
2Successfully started Throughput Anomaly Detection job with name: tad-2ecb054a-8c0d-4ae1-8444-c3493e7bb6d9
1linuxvm01:~/antrea/theia$ theia throughput-anomaly-detection list
2CreationTime        CompletionTime Name                                     Status
32023-06-08 08:25:10 N/A            tad-2ecb054a-8c0d-4ae1-8444-c3493e7bb6d9 RUNNING
4linuxvm01:~/antrea/theia$ theia throughput-anomaly-detection status tad-2ecb054a-8c0d-4ae1-8444-c3493e7bb6d9
5Status of this anomaly detection job is RUNNING: 0/0 (0%) stages completed

Lets wait for it to finish...

1linuxvm01:~$ theia throughput-anomaly-detection list
2CreationTime        CompletionTime      Name                                     Status
32023-06-08 08:25:10 2023-06-08 08:40:03 tad-2ecb054a-8c0d-4ae1-8444-c3493e7bb6d9 COMPLETED

Now check the output:

1# It is a long list so I am piping it to a text file
2linuxvm01:~$ theia throughput-anomaly-detection retrieve tad-2ecb054a-8c0d-4ae1-8444-c3493e7bb6d9 > anomaly-detection-1.txt

A snippet from the output:

 1id                                      sourceIP        sourceTransportPort     destinationIP   destinationTransportPort        flowStartSeconds        flowEndSeconds          throughput      algoCalc                anomaly
 22ecb054a-8c0d-4ae1-8444-c3493e7bb6d9    20.20.3.12      59448                   20.20.0.12      2181                            2023-06-06T22:04:43Z    2023-06-07T21:41:38Z    54204           65355.16485680155       true
 32ecb054a-8c0d-4ae1-8444-c3493e7bb6d9    20.20.3.12      59448                   20.20.0.12      2181                            2023-06-06T22:04:43Z    2023-06-08T07:09:32Z    49901           54713.50251767502       true
 42ecb054a-8c0d-4ae1-8444-c3493e7bb6d9    20.20.3.12      59448                   20.20.0.12      2181                            2023-06-06T22:04:43Z    2023-06-08T03:33:48Z    50000           53550.532983008845      true
 52ecb054a-8c0d-4ae1-8444-c3493e7bb6d9    20.20.3.12      59448                   20.20.0.12      2181                            2023-06-06T22:04:43Z    2023-06-08T00:52:48Z    59725           52206.69079880149       true
 62ecb054a-8c0d-4ae1-8444-c3493e7bb6d9    20.20.3.12      59448                   20.20.0.12      2181                            2023-06-06T22:04:43Z    2023-06-07T18:49:03Z    48544           53287.107990749006      true
 72ecb054a-8c0d-4ae1-8444-c3493e7bb6d9    20.20.3.12      59448                   20.20.0.12      2181                            2023-06-06T22:04:43Z    2023-06-07T22:06:43Z    61832           53100.99541753638       true
 82ecb054a-8c0d-4ae1-8444-c3493e7bb6d9    20.20.3.12      59448                   20.20.0.12      2181                            2023-06-06T22:04:43Z    2023-06-07T20:57:28Z    58295           54168.70924924757       true
 92ecb054a-8c0d-4ae1-8444-c3493e7bb6d9    20.20.3.12      59448                   20.20.0.12      2181                            2023-06-06T22:04:43Z    2023-06-08T07:36:38Z    47309           53688.236655529385      true
102ecb054a-8c0d-4ae1-8444-c3493e7bb6d9    20.20.3.12      59448                   20.20.0.12      2181                            2023-06-06T22:04:43Z    2023-06-08T08:05:43Z    59227           52623.71668244673       true
112ecb054a-8c0d-4ae1-8444-c3493e7bb6d9    20.20.3.12      59448                   20.20.0.12      2181                            2023-06-06T22:04:43Z    2023-06-08T05:22:12Z    58217           53709.42205164235       true
122ecb054a-8c0d-4ae1-8444-c3493e7bb6d9    20.20.3.12      59448                   20.20.0.12      2181                            2023-06-06T22:04:43Z    2023-06-08T02:19:03Z    48508           55649.8819138477        true
132ecb054a-8c0d-4ae1-8444-c3493e7bb6d9    20.20.3.12      59448                   20.20.0.12      2181                            2023-06-06T22:04:43Z    2023-06-07T23:27:28Z    53846           48125.33491950862       true
142ecb054a-8c0d-4ae1-8444-c3493e7bb6d9    20.20.3.12      59448                   20.20.0.12      2181                            2023-06-06T22:04:43Z    2023-06-08T00:09:38Z    59562           52143.367660610136      true
152ecb054a-8c0d-4ae1-8444-c3493e7bb6d9    20.20.3.12      59448                   20.20.0.12      2181                            2023-06-06T22:04:43Z    2023-06-08T02:44:08Z    50966           57119.323329628125      true
162ecb054a-8c0d-4ae1-8444-c3493e7bb6d9    20.20.3.12      59448                   20.20.0.12      2181                            2023-06-06T22:04:43Z    2023-06-08T08:24:17Z    55553           50480.7443391562        true
172ecb054a-8c0d-4ae1-8444-c3493e7bb6d9    20.20.3.12      59448                   20.20.0.12      2181                            2023-06-06T22:04:43Z    2023-06-08T00:02:38Z    44172           53694.11880964807       true
182ecb054a-8c0d-4ae1-8444-c3493e7bb6d9    20.20.3.12      59448                   20.20.0.12      2181                            2023-06-06T22:04:43Z    2023-06-08T04:07:57Z    53612           49714.00885995446       true
192ecb054a-8c0d-4ae1-8444-c3493e7bb6d9    20.20.3.12      59448                   20.20.0.12      2181                            2023-06-06T22:04:43Z    2023-06-08T01:54:58Z    59089           51972.42465384903       true

Antrea Egress

This chapter is also covered in another post I have done here with some tweaks ๐Ÿ˜ƒ

From the offical docs

Egress is a CRD API that manages external access from the Pods in a cluster. It supports specifying which egress (SNAT) IP the traffic from the selected Pods to the external network should use. When a selected Pod accesses the external network, the egress traffic will be tunneled to the Node that hosts the egress IP if itโ€™s different from the Node that the Pod runs on and will be SNATed to the egress IP when leaving that Node.

You may be interested in using this capability if any of the following apply:

  • A consistent IP address is desired when specific Pods connect to services outside of the cluster, for source tracing in audit logs, or for filtering by source IP in external firewall, etc.
  • You want to force outgoing external connections to leave the cluster via certain Nodes, for security controls, or due to network topology restrictions.