Securing Kubernetes clusters with Antrea Network Policies

Overview

Some context...

I have written a couple of post previously on the topic Antrea Policies, this time I will try to put it more into a context, how we can use and create Antrea Policies in different scenarios and with some "frameworks" from different perspectives in the organization.

What if, and how, can we deliver a already secured Kubernetes cluster, like an out of the box experience, with policies applied that meets certain guidelines for what is allowed and not allowed in the organization for certain Kubernetes clusters. Whether they are manually provisioned, or provisined on demand. So in this post a will try to be a bit specific on how to achieve this, with a simulated requirement as context, will get back to this context a further down. The following products will be used in this post: vSphere with Tanzu and TKG workload clusters, Antrea as the CNI, Tanzu Mission Control and VMware NSX.

As usual, for more details on the above mentioned product head over to the below links

Different layers of security, different personas, different enforcement points

This post will mostly be focusing in on the Kubernetes perspective, using specifically Antrea Network policies to restrict traffic inside the Kubernetes cluster. A Kubernetes cluster is just one infrastructure component in the organization, but contains many moving parts with applications and services inside. Even inside a Kubernetes cluster there can be different classifications for what should be allowed and not. Therefore a Kubernetes cluster is also in need to be to be secured with a set of tools and policies to satisfy the security policy guidelines in the organization. A Kubernetes cluster is another layer in the infrastructure that needs to be controlled. In a typical datacenter we have several security mechanisms in place like AV agents, physical firewall, virtual firewall, NSX distributed firewall. All these play an important role in the different layers of the datacenter/organization. Assuming the Kubernetes worker nodes are running as virtual machines on VMware vSphere the below illustration describes two layers of security using NSX distributed firewall securing the VM workers, and Antrea Network Policies securing pods, services inside the Kubernetes cluster.

two-layers

With the illustration above in mind it is fully possible to create a very strict environment with no unwanted lateral movement. Meaning only the strict necessary firewall openings inside the kubernetes cluster between pods, namespaces and services, but also between workers in same subnet and across several several Kubernetes clusters. But the above two layers, VMs in vSphere protected by the NSX distributed firewall and apps running Kubernetes clusters and protected by Antrea Network policies, are often managed by different personas in the organization. We have the vSphere admins, Network admins, Security Admins, App Operators and App Developers. Security is crucial in a modern datacenter, so, again, the correct tools needs to be in place for the organization's security-framework to be implemented all the way down the "stack" to be compliant. Very often there is a decided theoretical security framework/design in place, but that plan is not always so straightforward to implement.

Going back to Kubernetes again and Antrea Network policies. Antrea feature several static (and optional custom) Tiers where different types of network policies can be applied. As all the Antrea Network policies are evaluated "top-down" it is very handy to be able to place some strict rules very early in the "chain" of firewall policies to ensure the organization's security compliance is met. Being able to place these rules at the top prohibits the creation of rules further down that contradicts these top rules, they will not be evaluated. Then there is room to create a framework that gives some sense of "flexibility" to support the environment's workload according to the type of classification (prod, dev, test, dmz, trust, untrust). Other policies can be applied to further restrict movement before hitting a default block rule that takes care of anything that is not specified earlier in the "chain" of policies. The illustration below is an example of whom and where these personas can take charge and apply their needed policies.

personas

Then the next illustration is the default Static Tiers that comes with Antrea. These Tiers makes it easier to categorize the different policies in a Kubernetes cluster, but also provides a great way to delegate responsibility/permissions by using RBAC to control access to the Tiers. This means we can have some admins to apply policies in specific Tiers, and no one else can overwrite these.

tiers

Now, how can the different personas make sure their policies are applied? This is what I will go through next.

Managing and making sure the required Antrea Policies are applied

Lets start out by bringing some light on the simulated requirement I mentioned above. Customer Andreas have some strict security guidelines they need to follow to ensure compliance before anyone can do anything in the Kubernetes platforms. To be compliant according to the strict security guidelines the following must be in place:

  • All Kubernetes workload clusters are considered isolated and not allowed to reach nothing more than themselves, including pods and services (all nodes in the same cluster)
  • Only necessary backend functions such as DNS/NTP are allowed.
  • Certain management tools need access to the clusters
  • All non-system namespaces should be considered "untrusted" and isolated by default.
  • RBAC needs to be in place to ensure no tampering on applied security policies.

lock-down

The above diagram is what customer Andreas needs to have in place. Lets go ahead and apply them. In the next sub-chapters I will show how to apply and manage the policies in three different ways to acheive this. I assume the NSX personas has done their part and applied the correct distributed firewall rules isolating the worker nodes.

Applying Antrea policies with kubectl

This process involves logging into a newly provisioned Kubernetes cluster (TKG cluster in my environment) that someone has provisioned, could be the vSphere admin persona, or via a self-service. Then the security admin will be using kubectl to log in and apply some yaml definitions to acheive the above requirements. This operation will typically be the security admin responsibilities. The definitions the security admin is applying will all be configured in the static Tier "securityops" with different priorities. Here is the demo-environment I will be using in the following chapters:

demo-environment

The first requirement is a "no-trust" in any non-system namespaces, where I want to achieve full isolation between namespace. No communication from one namespace to another. In the Antrea homepage there are several examples, and I will use one of the examples that suits my need perfectly. It looks like this:

 1apiVersion: crd.antrea.io/v1alpha1
 2kind: ClusterNetworkPolicy
 3metadata:
 4  name: strict-ns-isolation-except-system-ns
 5spec:
 6  priority: 9
 7  tier: securityops
 8  appliedTo:
 9    - namespaceSelector:          # Selects all non-system Namespaces in the cluster
10        matchExpressions:
11          - {key:  kubernetes.io/metadata.name, operator: NotIn, values: [avi-system,default,kube-node-lease,kube-public,kube-system,secretgen-controller,tanzu-continuousdelivery-resources,tanzu-fluxcd-packageinstalls,tanzu-kustomize-controller,tanzu-source-controller,tkg-system,vmware-system-auth,vmware-system-cloud-provider,vmware-system-csi,vmware-system-tkg,vmware-system-tmc]}
12  ingress:
13    - action: Pass
14      from:
15        - namespaces:
16            match: Self           # Skip ACNP evaluation for traffic from Pods in the same Namespace
17      name: PassFromSameNS
18    - action: Drop
19      from:
20        - namespaceSelector: {}   # Drop from Pods from all other Namespaces
21      name: DropFromAllOtherNS
22  egress:
23    - action: Pass
24      to:
25        - namespaces:
26            match: Self           # Skip ACNP evaluation for traffic to Pods in the same Namespace
27      name: PassToSameNS
28    - action: Drop
29      to:
30        - namespaceSelector: {}   # Drop to Pods from all other Namespaces
31      name: DropToAllOtherNS

The only modifications I have done is adding all my system-namespaces. Then I will apply it.

1# Verifying no policies in place:
2andreasm@linuxvm01:~/antrea/policies/groups$ k get acnp
3No resources found
4andreasm@linuxvm01:~/antrea/policies/groups$ k apply -f acnp-ns-isolation-except-system-ns.yaml
5clusternetworkpolicy.crd.antrea.io/strict-ns-isolation-except-system-ns created
6andreasm@linuxvm01:~/antrea/policies/groups$ k get acnp
7NAME                                   TIER          PRIORITY   DESIRED NODES   CURRENT NODES   AGE
8strict-ns-isolation-except-system-ns   securityops   9          0               0               15s

Notice the 0 under Desired Nodes and Current Nodes. The reason is that this cluster is completely new, and there is no workload in any non-system namespaces yet. Here are the current namespaces:

 1andreasm@linuxvm01:~/antrea/policies/groups$ k get ns
 2NAME                                 STATUS   AGE
 3default                              Active   28d
 4kube-node-lease                      Active   28d
 5kube-public                          Active   28d
 6kube-system                          Active   28d
 7secretgen-controller                 Active   28d
 8tkg-system                           Active   28d
 9vmware-system-auth                   Active   28d
10vmware-system-cloud-provider         Active   28d
11vmware-system-csi                    Active   28d
12vmware-system-tkg                    Active   28d

Now if I apply a couple of namespaces and deploy some workload in them:

1andreasm@linuxvm01:~/antrea/policies/groups$ k apply -f dev-app.yaml -f dev-app2.yaml
2namespace/dev-app created
3deployment.apps/ubuntu-20-04 created
4namespace/dev-app2 created
5deployment.apps/ubuntu-dev-app2 created

How does the policy look like now?

 1andreasm@linuxvm01:~/antrea/policies/groups$ k get acnp
 2NAME                                   TIER          PRIORITY   DESIRED NODES   CURRENT NODES   AGE
 3strict-ns-isolation-except-system-ns   securityops   9          1               1               6s
 4# Why only one
 5andreasm@linuxvm01:~/antrea/policies/groups$ k get pods -n dev-app -owide
 6NAME                            READY   STATUS    RESTARTS   AGE   IP            NODE                                                      NOMINATED NODE   READINESS GATES
 7ubuntu-20-04-548545fc87-t2lg2   1/1     Running   0          82s   20.20.3.216   three-zone-cluster-1-node-pool-3-6r8c2-6c8d48656c-wntwc   <none>           <none>
 8andreasm@linuxvm01:~/antrea/policies/groups$ k get pods -n dev-app2 -owide
 9NAME                               READY   STATUS    RESTARTS   AGE   IP            NODE                                                      NOMINATED NODE   READINESS GATES
10ubuntu-dev-app2-564f46785c-g8vb6   1/1     Running   0          86s   20.20.3.215   three-zone-cluster-1-node-pool-3-6r8c2-6c8d48656c-wntwc   <none>           <none>

Both workloads ended up on same node...

So far so good. Now I need to verify if it is actually enforcing anything. From one of the dev-app pods I will execute into bash and try ping another pod in one for the system-namespaces, a pod in the the other dev-app namespace and try to a dns lookup.

1andreasm@linuxvm01:~/antrea/policies/groups$ k exec -it -n dev-app ubuntu-20-04-548545fc87-t2lg2 bash
2kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
3root@ubuntu-20-04-548545fc87-t2lg2:/# ping 20.20.1.7
4PING 20.20.1.7 (20.20.1.7) 56(84) bytes of data.
5^C
6--- 20.20.1.7 ping statistics ---
7170 packets transmitted, 0 received, 100% packet loss, time 173033ms

The ping above was from my dev-app pod to the coredns pod in kube-system. Ping to the other dev-app pod in the other dev-app namespace.

1root@ubuntu-20-04-548545fc87-t2lg2:/# ping 20.20.3.215
2PING 20.20.3.215 (20.20.3.215) 56(84) bytes of data.
3^C
4--- 20.20.3.215 ping statistics ---
59 packets transmitted, 0 received, 100% packet loss, time 8181ms

Is also blocked.

Now DNS lookup:

1root@ubuntu-20-04-548545fc87-t2lg2:/# ping google.com
2ping: google.com: Temporary failure in name resolution
3#So much empty

DNS was also one of the requirements, so I will have to fix this also. I mean, the security admin will have to fix this otherwise going to lunch will not be such a great place to be...

As the security admin have applied the above policy in the securityops tier with a priority of 9 he need to open up for DNS with policies in a higher tier or within same tier with a lower priority number (lower equals higher priority).

This is the policy he needs to apply:

 1apiVersion: crd.antrea.io/v1alpha1
 2kind: ClusterNetworkPolicy
 3metadata:
 4  name: allow-all-egress-dns-service
 5spec:
 6  priority: 8
 7  tier: securityops
 8  appliedTo:
 9    - namespaceSelector: {}
10#        matchLabels:
11#          k8s-app: kube-dns
12  egress:
13    - action: Allow
14      toServices:
15        - name: kube-dns
16          namespace: kube-system
17      name: "allowdnsegress-service"

A simple one, and the requirement is satisfied. Company Andreas allowed necessay functions such as DNS.. This policy will allow any namespace to reach the kube-dns service.

The rule applied:

1andreasm@linuxvm01:~/antrea/policies/groups$ k get acnp
2NAME                                   TIER          PRIORITY   DESIRED NODES   CURRENT NODES   AGE
3allow-all-egress-dns-service           securityops   8          4               4               2m24s
4strict-ns-isolation-except-system-ns   securityops   9          1               1               24m

What about DNS lookup now:

 1root@ubuntu-20-04-548545fc87-t2lg2:/# ping google.com
 2PING google.com (172.217.12.110) 56(84) bytes of data.
 364 bytes from 172.217.12.110 (172.217.12.110): icmp_seq=1 ttl=105 time=33.0 ms
 464 bytes from 172.217.12.110 (172.217.12.110): icmp_seq=2 ttl=104 time=29.8 ms
 564 bytes from 172.217.12.110 (172.217.12.110): icmp_seq=3 ttl=105 time=30.2 ms
 664 bytes from 172.217.12.110 (172.217.12.110): icmp_seq=4 ttl=104 time=30.3 ms
 764 bytes from 172.217.12.110 (172.217.12.110): icmp_seq=5 ttl=105 time=30.4 ms
 8^C
 9--- google.com ping statistics ---
105 packets transmitted, 5 received, 0% packet loss, time 4003ms
11rtt min/avg/max/mdev = 29.763/30.733/32.966/1.138 ms

Works.

Thats one more requirement met. Now one of the requirements was also to restrict access to services in other kubernetes clusters. Even though we trust that the NSX admins have created these isolation rules for us we need to make sure we are not allowed from the current kubernetes cluster also.

So to acheive this the security admin needs to create ClusterGroup containing the CIDR for its own worker nodes. Then apply a policy using the ClusterGroup. Here is the ClusterGroup definition (containing the cidr for the worker nodes):

1apiVersion: crd.antrea.io/v1alpha3
2kind: ClusterGroup
3metadata:
4  name: tz-cluster-1-node-cidr
5spec:
6  # ipBlocks cannot be set along with podSelector, namespaceSelector or serviceReference.
7  ipBlocks:
8    - cidr: 10.101.82.32/27

And I also need to define another ClusterGroup for all the RFC1918 subnets I need to block (this will include the cidr above):

 1apiVersion: crd.antrea.io/v1alpha3
 2kind: ClusterGroup
 3metadata:
 4  name: tz-cluster-1-drop-cidr
 5spec:
 6  # ipBlocks cannot be set along with podSelector, namespaceSelector or serviceReference.
 7  ipBlocks:
 8    - cidr: 10.0.0.0/8
 9    - cidr: 172.16.0.0/12
10    - cidr: 192.168.0.0/16

Apply them:

1andreasm@linuxvm01:~/antrea/policies/groups$ k apply -f tz-cluster-1-group-node-cidr.yaml
2clustergroup.crd.antrea.io/tz-cluster-1-node-cidr created
3andreasm@linuxvm01:~/antrea/policies/groups$ k apply -f tz-cluster-1-drop-cidr.yaml
4clustergroup.crd.antrea.io/tz-cluster-1-drop-cidr created
5andreasm@linuxvm01:~/antrea/policies/groups$ k get clustergroup
6NAME                     AGE
7tz-cluster-1-drop-cidr   6s
8tz-cluster-1-node-cidr   5s

And the policy to deny anything except the own kubernetes worker nodes:

 1apiVersion: crd.antrea.io/v1alpha1
 2kind: ClusterNetworkPolicy
 3metadata:
 4  name: acnp-drop-except-own-cluster-node-cidr
 5spec:
 6  priority: 8
 7  tier: securityops
 8  appliedTo:
 9    - namespaceSelector:          # Selects all non-system Namespaces in the cluster
10        matchExpressions:
11          - {key:  kubernetes.io/metadata.name, operator: NotIn, values: [avi-system,default,kube-node-lease,kube-public,kube-system,secretgen-controller,tanzu-continuousdelivery-resources,tanzu-fluxcd-packageinstalls,tanzu-kustomize-controller,tanzu-source-controller,tkg-system,vmware-system-auth,vmware-system-cloud-provider,vmware-system-csi,vmware-system-tkg,vmware-system-tmc]}
12  egress:
13    - action: Allow
14      to:
15        - group: "tz-cluster-1-node-cidr"
16    - action: Drop
17      to:
18        - group: "tz-cluster-1-drop-cidr"

Applied:

1andreasm@linuxvm01:~/antrea/policies/groups$ k apply -f tz-cluster-1-drop-anything-but-own-nodes.yaml
2clusternetworkpolicy.crd.antrea.io/acnp-drop-except-own-cluster-node-cidr created
3andreasm@linuxvm01:~/antrea/policies/groups$ k get acnp
4NAME                                     TIER          PRIORITY   DESIRED NODES   CURRENT NODES   AGE
5acnp-drop-except-own-cluster-node-cidr   securityops   8          1               1               3m39s
6allow-all-egress-dns-service             securityops   8          4               4               28m
7strict-ns-isolation-except-system-ns     securityops   9          1               1               50m

From the dev-app pod again I will verify if I am allowed to SSH to a worker node in "own" Kubernetes cluster, and another Linux machine not in the ClusterGroup cidr I have applied.

1root@ubuntu-20-04-548545fc87-t2lg2:/# ssh vmware-system-user@10.101.82.34 #A worker node in the current k8s cluster
2vmware-system-user@10.101.82.34's password:
3#This is allowed
4What about other machines outside the cidr:
5root@ubuntu-20-04-548545fc87-t2lg2:/# ssh 10.101.10.99
6ssh: connect to host 10.101.10.99 port 22: Connection timed out

That is very close to achieving this requirement also, but I should be allowed to reach pods inside same namespace regardless of which node they reside on. Here are my dev-app namespace with pods on all three nodes:

1andreasm@linuxvm01:~/antrea/policies/groups$ k get pods -n dev-app -o wide
2NAME                            READY   STATUS    RESTARTS   AGE    IP            NODE                                                      NOMINATED NODE   READINESS GATES
3ubuntu-20-04-548545fc87-75nsm   1/1     Running   0          116s   20.20.2.35    three-zone-cluster-1-node-pool-2-kbzvq-6846d5cc5b-6hdmj   <none>           <none>
4ubuntu-20-04-548545fc87-hhnv2   1/1     Running   0          116s   20.20.1.14    three-zone-cluster-1-node-pool-1-dgcpq-656c75f4f4-nsr2r   <none>           <none>
5ubuntu-20-04-548545fc87-t2lg2   1/1     Running   0          66m    20.20.3.216   three-zone-cluster-1-node-pool-3-6r8c2-6c8d48656c-wntwc   <none>           <none>
 1root@ubuntu-20-04-548545fc87-t2lg2:/# ping 20.20.1.14
 2PING 20.20.1.14 (20.20.1.14) 56(84) bytes of data.
 364 bytes from 20.20.1.14: icmp_seq=1 ttl=62 time=20.6 ms
 464 bytes from 20.20.1.14: icmp_seq=2 ttl=62 time=2.87 ms
 5^C
 6--- 20.20.1.14 ping statistics ---
 72 packets transmitted, 2 received, 0% packet loss, time 1002ms
 8rtt min/avg/max/mdev = 2.869/11.735/20.601/8.866 ms
 9root@ubuntu-20-04-548545fc87-t2lg2:/# ping 20.20.2.35
10PING 20.20.2.35 (20.20.2.35) 56(84) bytes of data.
1164 bytes from 20.20.2.35: icmp_seq=1 ttl=62 time=3.49 ms
1264 bytes from 20.20.2.35: icmp_seq=2 ttl=62 time=2.09 ms
1364 bytes from 20.20.2.35: icmp_seq=3 ttl=62 time=1.00 ms
14^C
15--- 20.20.2.35 ping statistics ---
163 packets transmitted, 3 received, 0% packet loss, time 2003ms
17rtt min/avg/max/mdev = 1.000/2.194/3.494/1.020 ms

From the Antrea UI, lets do some tests there also:

pod2pod-same-ns

dev-app-to-mgmt-host

dev-app-to-own-node

Note that I have not created any default-block-all-else-rule. There is always room for improvement, and this was just an excercise to show what is possible, not an final answer on how things should be done. Some of the policies can be even more granular like specifying only ports/protocol/FQDN etc..

So just to summarize what I have done:

These are the applied rules:

1NAME                                     TIER          PRIORITY   DESIRED NODES   CURRENT NODES   AGE
2acnp-drop-except-own-cluster-node-cidr   securityops   8          3               3               23h
3allow-all-egress-dns-service             securityops   8          4               4               23h
4strict-ns-isolation-except-system-ns     securityops   9          3               3               23h

The first rule is allowing only traffic to the nodes in its own cluster - matches this requirement "All Kubernetes workload clusters are considered isolated and not allowed to reach nothing more than themselves, including pods and services (all nodes in the same cluster)"

The second rule is allowing all namespaces to access the kube-dns service in the kube-system namespace - matches this requirement "Only necessary backend functions such as DNS/NTP are allowed"

The third rule is dropping all traffic between namespaces, except the "system"-namespaces I have defined. But it allows intra communication inside each namespace - matches this requirement "All non-system namespaces should be considered "untrusted" and isolated by default"

Then I have not done anything with RBAC yet, will come later in this post. And the requirement: "Certain management tools need access to the clusters" I can assume the NSX admins have covered, as I am not blocking any ingress traffic to the "system"-namespaces, same is true for egress from the system-namespaces. But it could be, if adjusted to allow the necessary traffic from these namespaces to the certain management tools.

Applying Antrea policies using TMC - Tanzu Mission Control

This section will not create any new scenario, it will re-use all the policies created and applied in the above section. The biggest difference is how the policies are being applied.

Not that I dont think any security admin dont want to log in to a Kubernetes cluster and apply these security policies, but it can be a bit tedious each time a new cluster is applied. What wouldnt be better then if we can auto-deploy them each time a new cluster is being deployed? Like an out-of-the-box experience? Yes, excactly. If we already have defined a policy scope for the different Kubernetes cluster in our environment, we could just apply the correct policies to each cluster respectively each time they are provisioned. This saves a lot of time. We can be sure each time a new cluster is provisioned, it is being applied with the correct set of policies. With the abiltiy to auto apply these required policies on creation or directly after creation will make provisioning out-of-the-box compliant clusters a joy.

Now this sounds interesting, how can I do that?

....Into the door comes TMC.... Hello Tanzu Mission Control, short TMC. With TMC we can administer Tanzu with vSphere in addition to a lot of other Kubernetes platforms. From the TMC official docs :

VMware Tanzu Mission Control™ is a centralized management platform for consistently operating and securing your Kubernetes infrastructure and modern applications across multiple teams and clouds.

Available through VMware Cloud™ services, Tanzu Mission Control provides operators with a single control point to give developers the independence they need to drive business forward, while ensuring consistent management and operations across environments for increased security and governance.

Tanzu Mission Control provides instances of the service in regions around the world, including Australia, Canada, India, Ireland, Japan, and USA. For a list of the regions in which the Tanzu Mission Control is hosted, go to the Cloud Management Services Availability page at https://www.vmware.com/global-infrastructure.html and select VMware Tanzu Mission Control.

Use Tanzu Mission Control to manage your entire Kubernetes footprint, regardless of where your clusters reside.

Lets cut to the chase and make my cluster compliant with the above rules.

Preparing TMC

In my TMC dashboard I need two thing in place:

  • A Git repository where I host my yamls, specifically my Antrea policy yamls.
  • A configured Kustomization using the above Git repo

Git repository

I will create a dedicated Git repo called tmc-cd-repo, and a folder structure. Here is my Github repo for this purpose:

github-repo

Now push the yamls to this repo's subfolder antrea-baseline-policies:

 1andreasm:~/github_repos/tmc-cd-repo (main)$ git add .
 2andreasm:~/github_repos/tmc-cd-repo (main)$ git commit -s -m "ready-to-lockdown"
 3[main 4ab93a7] ready-to-lockdown
 4 4 files changed, 53 insertions(+)
 5 create mode 100644 antrea/antrea-baseline-policies/acnp-allow-egress-all-coredns-service.yaml
 6 create mode 100644 antrea/antrea-baseline-policies/tz-cluster-1-drop-anything-but-own-nodes.yaml
 7 create mode 100644 antrea/antrea-baseline-policies/tz-cluster-1-drop-cidr.yaml
 8 create mode 100644 antrea/antrea-baseline-policies/tz-cluster-1-group-node-cidr.yaml
 9andreasm:~/github_repos/tmc-cd-repo (main)$ git push
10Enumerating objects: 11, done.
11Counting objects: 100% (11/11), done.
12Delta compression using up to 16 threads
13Compressing objects: 100% (7/7), done.
14Writing objects: 100% (8/8), 1.43 KiB | 733.00 KiB/s, done.
15Total 8 (delta 1), reused 0 (delta 0), pack-reused 0
16remote: Resolving deltas: 100% (1/1), done.
17To github.com:andreasm80/tmc-cd-repo.git
18   5c9ba04..4ab93a7  main -> main
19andreasm:~/github_repos/tmc-cd-repo (main)$

And here they are:

repo-updated

TMC Kustomization

Now in my TMC dashboard configure Git repo:

I can choose to add the Git repo per cluster that is managed by TMC or in a cluster group. I will go with adding the Git repo on my cluster called three-zone-cluster-1 for the moment. The benefit with adding it at the group is that it can be shared across multiple clusters. In TMC click Clusters and find your already managed and added cluster then click on it to "enter it".

In your cluster group click on the tab Add-ons

cluster-add-on

Then find Git repositories and Add Git Repository

add-git-repo

Fill in the needed fields. Make sure to expand advanced settings to update the branch to your branch or main branch. Can also adjust the sync intervall to higher or smaller. Default is 5, I have sat mine to 1. The repository url points to the actual repository, no subfolders. This is because in the Kustomization later we can have multiple pointing to the respective subfolder which can then be unique pr cluster etc. Make sure you also choose "no credentials needed" under Repository Credentials if using a public Git repo as I am.

add-git-repo

After save you should see a green status:

git-status

Now, we need to add a Kustomization. This can also be done in either a group or pr cluster. I will start with adding it directly to my specific cluster. In TMC click Cluster and select your cluster.

kustomization-cluster

Click Add-ons, Under Continuous Delivery click Installed Kustomizations. Add Kustomization.

Before I add my Kustomization, I have made sure I have deleted all the policies and groups in my test-cluster three-zone-cluster-1:

1andreasm@linuxvm01:~/antrea/policies/groups$ k get acnp
2No resources found
3andreasm@linuxvm01:~/antrea/policies/groups$ k get clustergroups
4No resources found

Then I will continue and add the Kustomization:

add-kustomization

Make sure to point to the correct subfolder in the Git repo. I have enabled the Prune option so I everything deployed via Kustomization will be deleted in my cluster if I decide to remove the Kustomization.

Click add.

added-kustomization

Succeeded

Click refresh in the top right corner, and it should be green. Lets check the policies and groups in the cluster itself..

1andreasm@linuxvm01:~/antrea/policies/groups$ k get acnp
2NAME                                     TIER          PRIORITY   DESIRED NODES   CURRENT NODES   AGE
3acnp-drop-except-own-cluster-node-cidr   securityops   8          3               3               70s
4allow-all-egress-dns-service             securityops   8          4               4               70s
5strict-ns-isolation-except-system-ns     securityops   9          3               3               70s
6andreasm@linuxvm01:~/antrea/policies/groups$ k get clustergroups
7NAME                     AGE
8tz-cluster-1-drop-cidr   73s
9tz-cluster-1-node-cidr   73s

The Antrea Policies have been applied.

Deploy TKC cluster from TMC - auto apply security policies

The above section enabled Kustomization on a already managed TKC cluster in TMC. In this section I will apply a TKC cluster from TMC and let the Antrea policies be automatically be applied.

In TMC I will create two Cluster Groups, one called andreas-dev-clusters and one called andreas-prod-clusters.

cluster-groups

create-cluster-group

After I have added the two cluster groups I will configure Add-ons. Same as in previous section, adding the the Git reop but this time I will point to the different subfolders I created in my Git repo. I have created two different sub-folders in my Git repo called: tmc-cd-repo/antrea/antrea-baseline-policies/dev-clusters and tmc-cd-repo/antrea/antrea-baseline-policies/prod-clusters. The reason I have done that is because I want the option to apply different Antrea policies for certain clusters, different environments different needs.

Before adding the Git repo on the two new Cluster groups in TMC I need to enable continuous delivere by clicking on this blue button.

enable-cd

git-repo

The Git repo has been added two both my new cluster groups. Now I just need to add the Kustomization pointing to my new Git repo subfolders dev-clusters and prod-clusters.

prod-clusters-kustomization

dev-cluster-kustomization

Now the preparations have been done in TMC, it is time to deploy the two TKC clusters from TMC and see if my policies are automatically applied. One "prod-cluster" and one "dev-cluster".

Lets start with the "prod-cluster"

create-prod-cluster

in-creation

Creating the dev-cluster

create-dev-cluster-2

dev-cluster-in-progress

The clusters are ready:

prod-cluster-2

dev-cluster-2-status

Let us check the sync status of my Kustomizations. Prod-Cluster Group:

prod-group-applied

Dev-Cluster Group:

dev-group-applied

Still applied.

Lets have a look inside the two TKC cluster using kubectl. Prod-Cluster-2:

1andreasm@linuxvm01:~/antrea/policies/groups$ k config current-context
2prod-cluster-2
3andreasm@linuxvm01:~/antrea/policies/groups$ k get acnp
4NAME                                                   TIER          PRIORITY   DESIRED NODES   CURRENT NODES   AGE
5allow-all-egress-dns-service                           securityops   8          2               2               35m
6prod-clusters-acnp-drop-except-own-cluster-node-cidr   securityops   8          0               0               35m
7prod-clusters-strict-ns-isolation-except-system-ns     securityops   9          0               0               35m

Dev-Cluster-2:

1andreasm@linuxvm01:~/antrea/policies/groups$ k config current-context
2dev-cluster-2
3andreasm@linuxvm01:~/antrea/policies/groups$ k get acnp
4NAME                                                  TIER          PRIORITY   DESIRED NODES   CURRENT NODES   AGE
5dev-clusters-strict-ns-isolation-except-system-ns     securityops   9          0               0               45s
6dev-clusters-acnp-drop-except-own-cluster-node-cidr   securityops   8          0               0               45s
7dev-clusters-allow-all-egress-dns-service             securityops   8          2               2               45s

Thats it then, if I need to change the policies I can just edit policies, git add, commit and push and they will be applied to all clusters in the group. By enabling this feature in TMC its just all about adding or attaching your clusters in the respective group in TMC and they will automatically get all the needed yamls applied.

add-cluster-tmc

Applying Antrea policies with NSX

With NSX one can also manage the native Antrea policies inside each TKC cluster (or any other Kubernetes cluster Antrea supports for that matter). I have written about this here. NSX can also create security policies "outside" the TKC cluster by using the inventory information it gets from Antrea and enforce them in the NSX Distributed firewall, a short section on this below.

Applying Antrea native policies from the NSX manager

So in this section I will quickly go through using the same "framework" as above using NSX as the "management-plane". Just a reminder, we have these three policies:

1NAME                                     TIER          PRIORITY   DESIRED NODES   CURRENT NODES   AGE
2acnp-drop-except-own-cluster-node-cidr   securityops   8          3               3               23h
3allow-all-egress-dns-service             securityops   8          4               4               23h
4strict-ns-isolation-except-system-ns     securityops   9          3               3               23h

The first rule is allowing only traffic to the nodes in its own cluster - matches this requirement "All Kubernetes workload clusters are considered isolated and not allowed to reach nothing more than themselves, including pods and services (all nodes in the same cluster)"

The second rule is allowing all namespaces to access the kube-dns service in the kube-system namespace - matches this requirement "Only necessary backend functions such as DNS/NTP are allowed"

The third rule is dropping all traffic between namespaces, except the "system"-namespaces I have defined. But it allows intra communication inside each namespace - matches this requirement "All non-system namespaces should be considered "untrusted" and isolated by default"

In NSX I will need to create some Security Groups, then use these groups in a Security Policy. So I will start by creating the Security Group for the concerning kube-dns service:

One can either define the service kube-dns:

sg-kube-dns

Or the pods that is responsible for the DNS service (CoreDNS:

dns-pods

This depends on how we define the policy in NSX. I have gone with the pod selection group.

AS the requirement supports all services to access DNS, I dont have to create a security group for the source. Then the policy will look like this in NSX:

dns-policy-all

Notice also that I have placed the policy in the Infrastructrue Tier in NSX.

This is how it looks like in the Kubernetes clusters:

 1andreasm@linuxvm01:~/antrea/policies/groups$ k get acnp 933e463e-c061-4e80-80b3-eff3402e41a9 -oyaml
 2apiVersion: crd.antrea.io/v1alpha1
 3kind: ClusterNetworkPolicy
 4metadata:
 5  annotations:
 6    ccp-adapter.antrea.tanzu.vmware.com/display-name: k8s-core-dns
 7  creationTimestamp: "2023-06-27T11:15:30Z"
 8  generation: 11
 9  labels:
10    ccp-adapter.antrea.tanzu.vmware.com/managedBy: ccp-adapter
11  name: 933e463e-c061-4e80-80b3-eff3402e41a9
12  resourceVersion: "2248486"
13  uid: a5d7378d-ede0-4f8c-848b-413c10ce5602
14spec:
15  egress:
16  - action: Allow
17    appliedTo:
18    - podSelector: {}
19    enableLogging: false
20    name: "2025"
21    ports:
22    - port: 53
23      protocol: TCP
24    - port: 53
25      protocol: UDP
26    to:
27    - group: c7e96b35-1961-4659-8a62-688a0e98fe63
28  priority: 1.0000000177635693
29  tier: nsx-category-infrastructure
30status:
31  currentNodesRealized: 4
32  desiredNodesRealized: 4
33  observedGeneration: 11
34  phase: Realized
 1andreasm@linuxvm01:~/antrea/policies/groups$ k get tiers
 2NAME                          PRIORITY   AGE
 3application                   250        6d1h
 4baseline                      253        6d1h
 5emergency                     50         6d1h
 6networkops                    150        6d1h
 7nsx-category-application      4          6d
 8nsx-category-emergency        1          6d
 9nsx-category-environment      3          6d
10nsx-category-ethernet         0          6d
11nsx-category-infrastructure   2          6d
12platform                      200        6d1h
13securityops                   100        6d1h

For the next policy, allowing only node in same cluster, I will need to create two groups with "ip-blocks" containing all RFC1918 in one group and the actual node range in the second:

all-rfc1918

dev-cluster-1-cidr

The policy in NSX will then look like this:

own-cidr-allow

This is how it looks like in the Kubernetes clusters:

 1apiVersion: crd.antrea.io/v1alpha1
 2kind: ClusterNetworkPolicy
 3metadata:
 4  annotations:
 5    ccp-adapter.antrea.tanzu.vmware.com/display-name: dev-cluster-1-intra
 6  creationTimestamp: "2023-07-03T12:27:13Z"
 7  generation: 2
 8  labels:
 9    ccp-adapter.antrea.tanzu.vmware.com/managedBy: ccp-adapter
10  name: 17dbadce-06cf-4d1e-9747-3e888f0f58e0
11  resourceVersion: "2257468"
12  uid: 73814a58-2da8-44c2-ba85-2522865430d1
13spec:
14  egress:
15  - action: Allow
16    appliedTo:
17    - podSelector: {}
18    enableLogging: false
19    name: "2027"
20    to:
21    - group: 2051f64c-8c65-46a2-8397-61c926c8c4ce
22  - action: Drop
23    appliedTo:
24    - podSelector: {}
25    enableLogging: false
26    name: "2028"
27    to:
28    - group: 5bfc16b1-08f3-48bd-91f9-fee3d66762b1
29  priority: 1.000000017763571
30  tier: nsx-category-infrastructure
31status:
32  currentNodesRealized: 4
33  desiredNodesRealized: 4
34  observedGeneration: 2
35  phase: Realized

Where the groups contain this:

 1apiVersion: crd.antrea.io/v1alpha3
 2kind: ClusterGroup
 3metadata:
 4  annotations:
 5    ccp-adapter.antrea.tanzu.vmware.com/createdFrom: nestdbGroupMsg
 6    ccp-adapter.antrea.tanzu.vmware.com/display-name: 2051f64c-8c65-46a2-8397-61c926c8c4ce
 7  creationTimestamp: "2023-07-03T12:27:13Z"
 8  generation: 1
 9  labels:
10    ccp-adapter.antrea.tanzu.vmware.com/managedBy: ccp-adapter
11  name: 2051f64c-8c65-46a2-8397-61c926c8c4ce
12  resourceVersion: "2257281"
13  uid: 18009c1b-c44f-4c75-a9f2-8a30e2415859
14spec:
15  childGroups:
16  - 2051f64c-8c65-46a2-8397-61c926c8c4ce-0
17status:
18  conditions:
19  - lastTransitionTime: "2023-07-03T12:27:13Z"
20    status: "True"
21    type: GroupMembersComputed
22andreasm@linuxvm01:~/nsx-antrea-integration$ k get clustergroup 2051f64c-8c65-46a2-8397-61c926c8c4ce-0 -oyaml
23apiVersion: crd.antrea.io/v1alpha3
24kind: ClusterGroup
25metadata:
26  annotations:
27    ccp-adapter.antrea.tanzu.vmware.com/createdFrom: nestdbGroupMsg
28    ccp-adapter.antrea.tanzu.vmware.com/display-name: 2051f64c-8c65-46a2-8397-61c926c8c4ce-0
29    ccp-adapter.antrea.tanzu.vmware.com/parent: 2051f64c-8c65-46a2-8397-61c926c8c4ce
30  creationTimestamp: "2023-07-03T12:27:13Z"
31  generation: 1
32  labels:
33    ccp-adapter.antrea.tanzu.vmware.com/managedBy: ccp-adapter
34  name: 2051f64c-8c65-46a2-8397-61c926c8c4ce-0
35  resourceVersion: "2257278"
36  uid: b1d4a59b-0557-4f6c-a08c-7b76af6bca8c
37spec:
38  ipBlocks:
39  - cidr: 10.101.84.32/27
40status:
41  conditions:
42  - lastTransitionTime: "2023-07-03T12:27:13Z"
43    status: "True"
44    type: GroupMembersComputed
 1andreasm@linuxvm01:~/nsx-antrea-integration$ k get clustergroup 5bfc16b1-08f3-48bd-91f9-fee3d66762b1 -oyaml
 2apiVersion: crd.antrea.io/v1alpha3
 3kind: ClusterGroup
 4metadata:
 5  annotations:
 6    ccp-adapter.antrea.tanzu.vmware.com/createdFrom: nestdbGroupMsg
 7    ccp-adapter.antrea.tanzu.vmware.com/display-name: 5bfc16b1-08f3-48bd-91f9-fee3d66762b1
 8  creationTimestamp: "2023-07-03T12:27:13Z"
 9  generation: 1
10  labels:
11    ccp-adapter.antrea.tanzu.vmware.com/managedBy: ccp-adapter
12  name: 5bfc16b1-08f3-48bd-91f9-fee3d66762b1
13  resourceVersion: "2257282"
14  uid: 6782589e-8488-47df-a750-04432c3c2f18
15spec:
16  childGroups:
17  - 5bfc16b1-08f3-48bd-91f9-fee3d66762b1-0
18status:
19  conditions:
20  - lastTransitionTime: "2023-07-03T12:27:13Z"
21    status: "True"
22    type: GroupMembersComputed
23andreasm@linuxvm01:~/nsx-antrea-integration$ k get clustergroup 5bfc16b1-08f3-48bd-91f9-fee3d66762b1-0 -oyaml
24apiVersion: crd.antrea.io/v1alpha3
25kind: ClusterGroup
26metadata:
27  annotations:
28    ccp-adapter.antrea.tanzu.vmware.com/createdFrom: nestdbGroupMsg
29    ccp-adapter.antrea.tanzu.vmware.com/display-name: 5bfc16b1-08f3-48bd-91f9-fee3d66762b1-0
30    ccp-adapter.antrea.tanzu.vmware.com/parent: 5bfc16b1-08f3-48bd-91f9-fee3d66762b1
31  creationTimestamp: "2023-07-03T12:27:13Z"
32  generation: 1
33  labels:
34    ccp-adapter.antrea.tanzu.vmware.com/managedBy: ccp-adapter
35  name: 5bfc16b1-08f3-48bd-91f9-fee3d66762b1-0
36  resourceVersion: "2257277"
37  uid: fd2a1c32-1cf8-4ca8-8dad-f5420f57e55c
38spec:
39  ipBlocks:
40  - cidr: 192.168.0.0/16
41  - cidr: 10.0.0.0/8
42  - cidr: 172.16.0.0/12
43status:
44  conditions:
45  - lastTransitionTime: "2023-07-03T12:27:13Z"
46    status: "True"
47    type: GroupMembersComputed

Now the last rule is blocking all non-system namespaces to any other namespace than themselves.

First I need to create a Security Group with the namespace as sole member, then a Security Group with the criteria not-equals. Group for the namespace:

ns-dev-app

Negated Security Group, selecting all pods which does not have the same label as any pods in the namespace "dev-app".

negated

Then the Security Policy looks like this:

policy-strict-ns

This is how it looks like in the Kubernetes clusters:

 1apiVersion: crd.antrea.io/v1alpha1
 2kind: ClusterNetworkPolicy
 3metadata:
 4  annotations:
 5    ccp-adapter.antrea.tanzu.vmware.com/display-name: dev-cluster-strict-ns-islolation
 6  creationTimestamp: "2023-07-03T13:00:40Z"
 7  generation: 3
 8  labels:
 9    ccp-adapter.antrea.tanzu.vmware.com/managedBy: ccp-adapter
10  name: cfbe3754-c365-4697-b124-5fbaddd87b57
11  resourceVersion: "2267847"
12  uid: 47949441-a69b-47e5-ae9b-1d5760d5c195
13spec:
14  egress:
15  - action: Allow
16    appliedTo:
17    - group: beed7011-4fc7-49e6-b7ed-d521095eb293
18    enableLogging: false
19    name: "2029"
20    to:
21    - group: beed7011-4fc7-49e6-b7ed-d521095eb293
22  - action: Drop
23    appliedTo:
24    - group: beed7011-4fc7-49e6-b7ed-d521095eb293
25    enableLogging: false
26    name: "2030"
27    to:
28    - group: f240efd5-3a95-49d3-9252-058cc80bc0c0
29  priority: 1.0000000177635728
30  tier: nsx-category-infrastructure
31status:
32  currentNodesRealized: 3
33  desiredNodesRealized: 3
34  observedGeneration: 3
35  phase: Realized

Where the cluster groups look like this:

 1andreasm@linuxvm01:~/nsx-antrea-integration$ k get clustergroup beed7011-4fc7-49e6-b7ed-d521095eb293 -oyaml
 2apiVersion: crd.antrea.io/v1alpha3
 3kind: ClusterGroup
 4metadata:
 5  annotations:
 6    ccp-adapter.antrea.tanzu.vmware.com/createdFrom: nestdbGroupMsg
 7    ccp-adapter.antrea.tanzu.vmware.com/display-name: beed7011-4fc7-49e6-b7ed-d521095eb293
 8  creationTimestamp: "2023-07-03T13:00:40Z"
 9  generation: 1
10  labels:
11    ccp-adapter.antrea.tanzu.vmware.com/managedBy: ccp-adapter
12  name: beed7011-4fc7-49e6-b7ed-d521095eb293
13  resourceVersion: "2266125"
14  uid: 7bf8d0f4-d719-47d5-98a9-5fba3b5da7b9
15spec:
16  childGroups:
17  - beed7011-4fc7-49e6-b7ed-d521095eb293-0
18status:
19  conditions:
20  - lastTransitionTime: "2023-07-03T13:00:41Z"
21    status: "True"
22    type: GroupMembersComputed
23andreasm@linuxvm01:~/nsx-antrea-integration$ k get clustergroup beed7011-4fc7-49e6-b7ed-d521095eb293-0 -oyaml
24apiVersion: crd.antrea.io/v1alpha3
25kind: ClusterGroup
26metadata:
27  annotations:
28    ccp-adapter.antrea.tanzu.vmware.com/createdFrom: nestdbGroupMsg
29    ccp-adapter.antrea.tanzu.vmware.com/display-name: beed7011-4fc7-49e6-b7ed-d521095eb293-0
30    ccp-adapter.antrea.tanzu.vmware.com/parent: beed7011-4fc7-49e6-b7ed-d521095eb293
31  creationTimestamp: "2023-07-03T13:00:40Z"
32  generation: 1
33  labels:
34    ccp-adapter.antrea.tanzu.vmware.com/managedBy: ccp-adapter
35  name: beed7011-4fc7-49e6-b7ed-d521095eb293-0
36  resourceVersion: "2266123"
37  uid: 4b393674-981a-488c-a2e2-d794f0b0a312
38spec:
39  namespaceSelector:
40    matchExpressions:
41    - key: kubernetes.io/metadata.name
42      operator: In
43      values:
44      - dev-app
45status:
46  conditions:
47  - lastTransitionTime: "2023-07-03T13:00:41Z"
48    status: "True"
49    type: GroupMembersComputed
 1andreasm@linuxvm01:~/nsx-antrea-integration$ k get clustergroup f240efd5-3a95-49d3-9252-058cc80bc0c0 -oyaml
 2apiVersion: crd.antrea.io/v1alpha3
 3kind: ClusterGroup
 4metadata:
 5  annotations:
 6    ccp-adapter.antrea.tanzu.vmware.com/createdFrom: nestdbGroupMsg
 7    ccp-adapter.antrea.tanzu.vmware.com/display-name: f240efd5-3a95-49d3-9252-058cc80bc0c0
 8  creationTimestamp: "2023-07-03T13:06:59Z"
 9  generation: 1
10  labels:
11    ccp-adapter.antrea.tanzu.vmware.com/managedBy: ccp-adapter
12  name: f240efd5-3a95-49d3-9252-058cc80bc0c0
13  resourceVersion: "2267842"
14  uid: cacd1386-a434-4c42-8739-6813dd1d475b
15spec:
16  childGroups:
17  - f240efd5-3a95-49d3-9252-058cc80bc0c0-0
18status:
19  conditions:
20  - lastTransitionTime: "2023-07-03T13:07:00Z"
21    status: "True"
22    type: GroupMembersComputed
23andreasm@linuxvm01:~/nsx-antrea-integration$ k get clustergroup f240efd5-3a95-49d3-9252-058cc80bc0c0-0 -oyaml
24apiVersion: crd.antrea.io/v1alpha3
25kind: ClusterGroup
26metadata:
27  annotations:
28    ccp-adapter.antrea.tanzu.vmware.com/createdFrom: nestdbGroupMsg
29    ccp-adapter.antrea.tanzu.vmware.com/display-name: f240efd5-3a95-49d3-9252-058cc80bc0c0-0
30    ccp-adapter.antrea.tanzu.vmware.com/parent: f240efd5-3a95-49d3-9252-058cc80bc0c0
31  creationTimestamp: "2023-07-03T13:06:59Z"
32  generation: 5
33  labels:
34    ccp-adapter.antrea.tanzu.vmware.com/managedBy: ccp-adapter
35  name: f240efd5-3a95-49d3-9252-058cc80bc0c0-0
36  resourceVersion: "2269597"
37  uid: bd7f4526-2be9-4a4e-860e-0bb85ea30516
38spec:
39  podSelector:
40    matchExpressions:
41    - key: app
42      operator: NotIn
43      values:
44      - ubuntu-20-04
45status:
46  conditions:
47  - lastTransitionTime: "2023-07-03T13:07:00Z"
48    status: "True"
49    type: GroupMembersComputed

With all three policies applied, they look like this in the TKC cluster:

1andreasm@linuxvm01:~/antrea/policies/groups$ k get acnp
2NAME                                   TIER                          PRIORITY             DESIRED NODES   CURRENT NODES   AGE
317dbadce-06cf-4d1e-9747-3e888f0f58e0   nsx-category-infrastructure   1.000000017763571    4               4               18h
4933e463e-c061-4e80-80b3-eff3402e41a9   nsx-category-infrastructure   1.0000000177635702   4               4               18h
5cfbe3754-c365-4697-b124-5fbaddd87b57   nsx-category-infrastructure   1.0000000177635728   3               3               17h

By using NSX managing the Antrea policies there is also a very easy way to verify if the policies are working or not by using the Traffic Analysis tool in NSX:

nsx-traffic-analysis

This tools will also inform you of any policies applied by using kubectl inside the cluster, in other words it can also show you policies not created or applied from the NSX manager.

I have applied a Antrea Policy directly in the TKC cluster using kubectl called block-ns-app3-app4.

1andreasm@linuxvm01:~/antrea/policies/groups$ k get acnp
2NAME                                   TIER                          PRIORITY             DESIRED NODES   CURRENT NODES   AGE
317dbadce-06cf-4d1e-9747-3e888f0f58e0   nsx-category-infrastructure   1.000000017763571    4               4               20h
4933e463e-c061-4e80-80b3-eff3402e41a9   nsx-category-infrastructure   1.0000000177635702   4               4               20h
5block-ns-app3-app4   #this             securityops                   4                    1               1               3s
6cfbe3754-c365-4697-b124-5fbaddd87b57   nsx-category-infrastructure   1.0000000177635728   3               3               20h

If I do a traceroute from within NSX from a pod in ns Dev-App3 to a pod in ns Dev-App4 and hit this rule, the NSX manager will show me this:

traceflow

Its clearly doing its job and blocking the traffic, but which rule is it? Click on EgressMetric, copy the rule id and paste it in the search field in NSX:

policy-id

search

As mentioned above, NSX can also utilize the information from TKC cluster (or any Kubernetes cluster that uses Antrea) to enforce them in the Distributed firewall. The information NSX is currently:

  • Kubernetes Cluster - Used to create security group containing Kubernetes clusters by name, not used alone but in combination with the below ->
  • Kubernetes Namespace - Used to create security group containing Kubernetes clusters namespace by name or tag, not used alone but in combination from a Kubernetes cluster defined above.
  • Kubernetes Service - Used to create security group containing Kubernetes Services by name or tag, not used alone but in combination with any of the above ->
  • Kubernetes Ingress - Used to create security group containing Kubernetes Ingresses by name or tag, not used alone but in combination with any of the above Kubernetes Cluster or Kubernetes Namespace.
  • Antrea Egress - Used to create security group containing Antrea Egress IP in use by name or tag, not used alone but in combination with only Kubernetes Cluster.
  • Antrea IP Pool - Used to create security group containing Antrea Egress IP Pool by name or tag, not used alone but in combination with only Kubernetes Cluster.
  • Kubernetes Node - Used to create security group containing Kubernetes Node IPs or POD CIDRs by node IP address or POD CIDR, not used alone but in combination with only Kubernetes Cluster.
  • Kubernetes Gateway - Used to create security group containing Kubernetes Gateways by name or tag, not used alone but in combination with only Kubernetes Cluster.

An example of a Security group in NSX using the contexts above, Kubernetes Cluster with name dev-cluster and Kubernetes Node IP address:

kubernetes-cluster-nodes

kubernets-nodes-dev-cluster-1

Now, if I want to create a NSX firewall policy isolating two Kubernetes clusters from each other using the constructs above:

I will simply create two security groups like the one above, selection the two different cluster in each group. Then the policy will be like this:

policy-blocking-dev1-to-dev2

Now if I do a traceflow from any node in dev-cluster-1 to any node in dev-cluster-2 it will dropped.

traceflow-dev-1-dev-2

The Firewall Rule ID is:

drop-rule

With this approach, its very easy to isolate complete clusters from each other with some really simple rules. We could even create a negated rule, saying you are allowed to reach any workers from same cluster but nothing else with one blocking rule (using a negated selection where source is dev-cluster-1 and destination is also dev-cluster-1:

destination-same-source

The policy:

negated

This is just one rule blocing everything except its own Kubernetes nodes.

RBAC - making sure no one can overwrite/override existing rules.

How to manage RBAC, or Tier Entitlement with Antrea I have already covered here

Outro...

I have in this post shown three different ways to manage and apply Antrea Network policies. Three different approaches, the first approach was all manual, the second automatic but the policies still needs to be defined. The last one with the NSX manager a bit different approach as not all the Antrea Network policy features are available and some policies have to be defined different. But, the NSX manager can also be used to automate some of the policies by just adding the clusters to existing policies. Then they will be applied at once.

The Antrea policies used and how they are defined in this post is by all means not the final answer or best practice. They were just used as simple examples to have something to "work with" during this post. As I have mentioned, one could utilise the different tiers to delegate administration of the policies to the right set of responsibilities (security admins, vSphere operators, Dev-ops etc). If the target is zero-trust also inside your TKC clusters, this can be achieved by utilizing the tiers and place a drop-all-else rule dead last in the Antrea policy chain (baseline tier e.g).