Tanzu Kubernetes Grid 2.2 & Remote Workload Clusters

Overview

Overview of what this post is about

In this post my goal is to use TKG (Tanzu Kubernetes Grid here) to manage and deploy workload clusters in remote datacenter's. The reason for such needs can be many like easier lifecycle management of workload clusters in environments where we have several datacenters, with different physical locations. To lower the management overhead and simplify lifecycle management, like updates, being able to centrally manage these operations is sometimes key.

So in this post I have two datacenters, with vSphere clusters each being managed by their own vCenter Server. NSX is installed in both datacenters managing the network in their respective datacenter. NSX Advanced Loadbalancer (Avi) is deployed in datacenter 1 and is managing both datacenters (this post will not cover GSLB and having Avi controllers deployed in both datacenters and why this also could be a smart thing to consider). This Avi installation will be responsible for creating SE's (Service Engines) and virtual services for the datacenter it is deployed in as well as the remote datacenter (datacenter 2). This will be the only controller/manager that is being shared across the two datacenters.

So trying to illustrate the above with the following diagram:

overview

I will not go into how connectivity between these sites are established, I just assume that relevant connectivity between the sites are in place (as that is a requirement for this to work).

Meet the "moving parts" involved

This section will quickly go through the components being used in their different datacenters. First out is the components used in the datacenter 1 environment.

Datacenter 1

In datacenter 1 I have one vSphere cluster consisting of four ESXi hosts being managed by a vCenter server. This datacenter will be my managment datacenter where I will deploy my TKG management cluster. It is placed in physical location 1 with its own physical network and storage (using vSAN).

This is vSphere cluster in datacenter 1:

datacenter-1

In datacenter 1 I have also installed NSX-T to handle all the networking needs in this datacenter. There is no stretching of networks between the datacenters, the NSX environment is only responsible for the same datacenter it is installed in as you can see below:

nsx-dc-1

It also has a 1:1 relationship to the vCenter Server in datacenter 1:

nsx-t-vcenter

This NSX environment has created the following networks to support my TKG managment cluster deployment:

networks-nsx-t

And from the vCenter:

networks-vcenter

I will quickly just describe the different networks:

  • ls-avi-dns-se-data: This is where I place the dataplane of my SE's used for the DNS service in Avi
  • ls-avi-se-data: This is where I place the dataplane of my SE's used for my other virtual services (regular application needs, or if I happen to deploy services in my TKG mgmt cluster or a workload cluster in the same datacenter.) This network will not be used in this post.
  • ls-mgmt: is where I place the management interface of my SE's.
  • ls-tkg-mgmt: This will be used in this post and is where my TKG management cluster nodes will be placed.

The ls-tkg-mgmt network has also been configured with DHCP on the segment in NSX:

nsx-dhcp

And last but not least component, the Avi controller.

This is the component in this post that has been configured to handle requests from both datacenters as a shared resource, whether it is regular layer-4 services like servicetype loadBalancer or layer-7 services like Ingress. As both datacenters are being managed by their own NSX-T I have configured the Avi controller to use both NSX-T environments as two different clouds:

avi-clouds

Each cloud depicted above is reflecting the two different datacenters and have been configured accordingly to support the network settings in each datacenter respectively.

Each cloud is a NSX-T cloud and have its own unique configurations that matches the configuration for the respective datacenter the cloud is in. Networks, IPAM/DNS profiles, routing contexts, service engine groups. Below is some screenshots from the Avi controller:

stc-nsx-cloud

stc-nsx-cloud-continued

Service engine groups in the stc-nsx-cloud:

svc-eng-group

The above SE groups have been configured for placement in their respective vSphere clusters, folder, naming, datastore etc.

stc-nsx-network-1

The above networks have been configured to provision IP addresses using Avi IPAM to automate SE's dataplane creation.

The below networks are the vip networks configured in the stc-nsx-cloud:

stc-nsx-networks-2

Then the routing context (or VRF context) for the SE's to reach the backend

vrf-context

The same has been done for the wdc-nsx-cloud. I will not print them here, but just show that there's is also a wdc-cloud configured in these sections also:

wdc-cloud-networks

Notice the difference in IP subnets.

Then its the IPAM DNS profiles for both clouds:

ipam-dns

Instead of going into too many details how to configure Avi, its all about to configure it to support the infrastructure settings in each datacenter. Then when the requests for virtual services come to the Avi controller it knows how to handle the requests and create the virtual services, the service engines and do the ip addressing correctly. Then this part will just work butter smooth.

An overview of the components in datacenter 1:

dc1

Datacenter 2

In datacenter 2 I have also one vSphere cluster consisting of four ESXi hosts being managed by a vCenter server. This datacenter will be my remote/edge datacenter where I will deploy my TKG workload clusters. It is placed in physical location 2 with its own physical network and storage (using vSAN).

This is vSphere cluster in datacenter 2:

vsphere-dc-2

In datacenter 2 I have also installed NSX-T to handle all the networking needs in this datacenter. As mentioned above, there is no stretching of networks between the datacenters, the NSX environments is only responsible for the same datacenter it is installed in as you can see below:

nsx-t-nodes-dc-2

It also has a 1:1 relationship to the vCenter Server in datacenter 1:

nsx-t-vcenter-dc-2

This NSX environment has created the following networks to support my TKG managment cluster deployment:

nsx-t-segments-dc-2

And from the vCenter:

vcenter-networks

I will quickly just describe the different networks:

  • ls-avi-dns-se-data: This is where I place the dataplane of my SE's used for the DNS service in Avi
  • ls-avi-generic-se-data: This is where I place the dataplane of my SE's used for the virtual services created when I expose services from the workload clusters. This network will be used in this post.
  • ls-mgmt: is where I place the management interface of my SE's.
  • ls-tkg-wdc-wlc-1: This will be used as placement for my TKG workload cluster nodes in this datacenter.

The ls-tkg-wdc-wlc-1 network has also been configured with DHCP on the segment in NSX:

nsx-dhcp

An overview again of the components in DC2:

dc-2

Thats it for the "moving parts" involved in both datacenters for this practice.

TKG management cluster deployment

Now finally for the fun parts. Deployment. As I have mentioned in the previous chapters, I will deploy the TKG management cluster in datacenter 1. But before I do the actual deployment I will need to explain a little around how a TKG cluster is reached, whether its the management cluster or the workload clusters.

Kubernetes API endpoint - exposing services inside the kubernetes clusters (tkg clusters)

A Kubernetes cluster consist usually of 1 or 3 controlplane nodes. This is where the Kubernetes API endpoint lives. When interacting with Kubernetes we are using the exposed Kubernetes APIs to tell it declaratively (some say in a nice way) to realise something we want it to do. This api endpoint is usually exposed on port 6443, and will always be available on the control plane nodes, not on the worker nodes. So the first criteria to be met is connectivity to the control plane nodes on port 6443 (or ssh into the controlplane nodes themselves on port 22 and work with the kube-api from there, but not ideal). We want to reach the api from a remote workstation to be more flexible and effective in how we interact with the Kubernetes API. When having just 1 controlplane node it is probably just ok to reach this one controlplane node and send our api calls directly but with just one controlplane node this can create some issues down the road when we want to replace/upgrade this one node, it can change (most likely will) IP address. Meaning our kubeconfig context/automation tool needs to be updated accordingly. So what we want is a virtual ip address that will stay consistent across the lifetime of the Kubernetes cluster. The same is also when we have more than one controlplane node, 3 is a common number of controlplane nodes in production. We cant have an even number of controlplane nodes as we want quorum. We want to have 1 consistent IP address to reach either just the one controlplane node's Kubernetes API or 1 consistent IP address loadbalanced across all three controlplane nodes. To achieve that we need some kind of loadbalancer that can create this virtual ip address for us to expose the Kubernetes API consistently. In TKG we can use NSX Advanced Loabalancer for this purpose, or a simpler approach like Kube-VIP. I dont want to go into a big writeup on the difference between these two other than they are not comparable to each other. Kube-VIP will not loadbalance the Kubernetes API between the 3 control-plane nodes, it will just create a virtual ip in the same subnet as the controlplane nodes and be placed on one of the controlplane nodes, stay there until the node fails and move over to the other control-plane nodes. While NSX ALB will loadbalance the Kuberntes API endpoint between all three control-plane nodes and the IP address is automatically allocated on provisioning. Kube-VIP is statically assigned.

Info

Why I am mentioning this? Why could I not just focus on NSX Advanced Loadbalancer that can cover all my needs? That is because in this specific post I am hitting a special use-case where I have my TKG management cluster placed in one datacenter managed by its own NSX-T, while I want to deploy and manage TKG workload clusters in a completely different datacenter also managed by its own NSX-T. By using NSX Advanced Loabalancer as my API endpoint VIP provider in combination with NSX-T Clouds (Avi Clouds) I am currently not allowed to override the control-plane network (API endpoint). It is currently not possible to override or select a different NSX-T Tier1 for the the control-plane network, as these are different due to two different NSX-T environments, I can name the Tier-1 routers identically in both datacenters, but its not so easily fooled 😄 So my option to work around this is to use Kube-VIP. Kube-VIP allows me to configure manually the API endpoint IP for my workload clusters. I will try to explain a bit more how the NSX ALB integration works in TKG below.

What about the services I want to expose from the different workload-clusters like servicetype loadBalancer and Ingress? That is a different story, there we can use NSX Advanced Loadbalancer as much as we want and in a very flexible way too. The reason for that is that the Kubernetes API endpoint VIP or controlplane network is something that is managed and controlled by the TKG management cluster while whats coming from the inside of a working TKG workload cluster is completely different. Using NSX Advanced Loadbalancer in TKG or in any other Kubernetes platform like native upstream Kubernetes we use a component called AKO (Avi Kubernetes Operator) that handles all the standard Kubernetes requests like servicetype loadBalancer and Ingress creation and forwards them to the NSX ALB controller to realize them. In TKG we have AKO running in the management cluster that is responsible for the services being exposed from inside the TKG management cluster, but also assigning the VIP for the workload clusters Kubernetes API (controlplane network). As soon as we have our first TKG workload cluster, this comes with its own AKO that is responsible for all the services in the workload cluster it runs in, it will not have anything to do with the controlplane network and the AKO running in the TKG management cluster. So we can actually adjust this AKO instance to match our needs there without being restricted to what the AKO instance in the TKG management cluster is configured with.

In a TKG workload cluster there is a couple of ways to get AKO installed. One option is to use the AKO Operator running in the TKG management cluster to deploy it automatically on TKG workload cluster provisioning. This approach is best if you want TKG to handle the lifecycle of the AKO instance, like upgrades and it is very hands-off. We need to define an AkoDeploymentConfig in the TKG management cluster that defines the AKO settings for the respective TKG workload cluster or clusters if they can share the same settings. This is based on labels so its very easy to create the ADC for a series of clusters or specific cluster by applying the correct label on the cluster. The other option is to install AKO via Helm, this gives you full flexibility but is a manual process that needs to be done on all TKG workload clusters that needs AKO installed. I tend to lean on the ADC approach as I cant see any limitation this approach has compared to the AKO via Helm approach. ADC also supports AviInfraSettings which gives you further flexibility and options.

With that out of the way let us get this TKG management cluster deployed already...

TKG management cluster deployment - continued

I will not cover any of the pre-reqs to deploy TKG, for that have a look here, I will just go straight to it. My TKG managment cluster bootstrap yaml manifest. Below I will paste my yaml for the TKG mgmt cluster with some comments that I have done to make use of Kube-VIP for the controlplane, aka Kubernetes API endpoint.

  1#! ---------------
  2#! Basic config
  3#! -------------
  4CLUSTER_NAME: tkg-stc-mgmt-cluster
  5CLUSTER_PLAN: dev
  6INFRASTRUCTURE_PROVIDER: vsphere
  7ENABLE_CEIP_PARTICIPATION: "false"
  8ENABLE_AUDIT_LOGGING: "false"
  9CLUSTER_CIDR: 100.96.0.0/11
 10SERVICE_CIDR: 100.64.0.0/13
 11TKG_IP_FAMILY: ipv4
 12DEPLOY_TKG_ON_VSPHERE7: "true"
 13CLUSTER_API_SERVER_PORT: 6443 #Added for Kube-VIP
 14VSPHERE_CONTROL_PLANE_ENDPOINT: 10.13.20.100 #Added for Kube-VIP - specify a static IP in same subnet as nodes
 15VSPHERE_CONTROL_PLANE_ENDPOINT_PORT: 6443 #Added for Kube-VIP
 16VIP_NETWORK_INTERFACE: "eth0" #Added for Kube-VIP
 17# VSPHERE_ADDITIONAL_FQDN:
 18AVI_CONTROL_PLANE_HA_PROVIDER: false #Set to false to use Kube-VIP instead
 19AVI_ENABLE: "true" #I still want AKO to be installed, but not used for controplane endpoint
 20
 21#! ---------------
 22#! vSphere config
 23#! -------------
 24VSPHERE_DATACENTER: /cPod-NSXAM-STC
 25VSPHERE_DATASTORE: /cPod-NSXAM-STC/datastore/vsanDatastore
 26VSPHERE_FOLDER: /cPod-NSXAM-STC/vm/TKGm
 27VSPHERE_INSECURE: "false"
 28VSPHERE_NETWORK: /cPod-NSXAM-STC/network/ls-tkg-mgmt
 29VSPHERE_PASSWORD: "password"
 30VSPHERE_RESOURCE_POOL: /cPod-NSXAM-STC/host/Cluster/Resources
 31#VSPHERE_TEMPLATE: /Datacenter/vm/TKGm/ubuntu-2004-kube-v1.23.8+vmware.2
 32VSPHERE_SERVER: vcsa.cpod-nsxam-stc.az-stc.cloud-garage.net
 33VSPHERE_SSH_AUTHORIZED_KEY: ssh-rsa ssh-public key
 34VSPHERE_TLS_THUMBPRINT: vcenter SHA1
 35VSPHERE_USERNAME: username@domain.net
 36
 37#! ---------------
 38#! Node config
 39#! -------------
 40OS_ARCH: amd64
 41OS_NAME: ubuntu
 42OS_VERSION: "20.04"
 43VSPHERE_CONTROL_PLANE_DISK_GIB: "20"
 44VSPHERE_CONTROL_PLANE_MEM_MIB: "4096"
 45VSPHERE_CONTROL_PLANE_NUM_CPUS: "2"
 46VSPHERE_WORKER_DISK_GIB: "20"
 47VSPHERE_WORKER_MEM_MIB: "4096"
 48VSPHERE_WORKER_NUM_CPUS: "2"
 49CONTROL_PLANE_MACHINE_COUNT: 1
 50WORKER_MACHINE_COUNT: 2
 51
 52#! ---------------
 53#! Avi config
 54#! -------------
 55AVI_CA_DATA_B64: AVI Controller Base64 Certificate
 56AVI_CLOUD_NAME: stc-nsx-cloud
 57AVI_CONTROLLER: 172.24.3.50
 58# Network used to place workload clusters' endpoint VIPs
 59#AVI_CONTROL_PLANE_NETWORK: vip-tkg-wld-l4
 60#AVI_CONTROL_PLANE_NETWORK_CIDR: 10.13.102.0/24
 61# Network used to place workload clusters' services external IPs (load balancer & ingress services)
 62AVI_DATA_NETWORK: vip-tkg-wld-l7
 63AVI_DATA_NETWORK_CIDR: 10.13.103.0/24
 64# Network used to place management clusters' services external IPs (load balancer & ingress services)
 65AVI_MANAGEMENT_CLUSTER_VIP_NETWORK_CIDR: 10.13.101.0/24
 66AVI_MANAGEMENT_CLUSTER_VIP_NETWORK_NAME: vip-tkg-mgmt-l7
 67# Network used to place management clusters' endpoint VIPs
 68#AVI_MANAGEMENT_CLUSTER_CONTROL_PLANE_VIP_NETWORK_NAME: vip-tkg-mgmt-l4
 69#AVI_MANAGEMENT_CLUSTER_CONTROL_PLANE_VIP_NETWORK_CIDR: 10.13.100.0/24
 70AVI_NSXT_T1LR: Tier-1
 71AVI_CONTROLLER_VERSION: 22.1.2
 72AVI_LABELS: "{adc-enabled: 'true'}" #Added so I can select easily which workload cluster that will use this AKO config
 73AVI_PASSWORD: "password"
 74AVI_SERVICE_ENGINE_GROUP: stc-nsx
 75AVI_MANAGEMENT_CLUSTER_SERVICE_ENGINE_GROUP: tkgm-se-group
 76AVI_USERNAME: admin
 77AVI_DISABLE_STATIC_ROUTE_SYNC: false
 78AVI_INGRESS_DEFAULT_INGRESS_CONTROLLER: true
 79AVI_INGRESS_SHARD_VS_SIZE: SMALL
 80AVI_INGRESS_SERVICE_TYPE: NodePortLocal
 81
 82#! ---------------
 83#! Proxy config
 84#! -------------
 85TKG_HTTP_PROXY_ENABLED: "false"
 86
 87#! ---------------------------------------------------------------------
 88#! Antrea CNI configuration
 89#! ---------------------------------------------------------------------
 90# ANTREA_NO_SNAT: false
 91# ANTREA_TRAFFIC_ENCAP_MODE: "encap"
 92# ANTREA_PROXY: false
 93# ANTREA_POLICY: true
 94# ANTREA_TRACEFLOW: false
 95ANTREA_NODEPORTLOCAL: true
 96ANTREA_PROXY: true
 97ANTREA_ENDPOINTSLICE: true
 98ANTREA_POLICY: true
 99ANTREA_TRACEFLOW: true
100ANTREA_NETWORKPOLICY_STATS: false
101ANTREA_EGRESS: true
102ANTREA_IPAM: false
103ANTREA_FLOWEXPORTER: false
104ANTREA_SERVICE_EXTERNALIP: false
105ANTREA_MULTICAST: false
106
107#! ---------------------------------------------------------------------
108#! Machine Health Check configuration
109#! ---------------------------------------------------------------------
110ENABLE_MHC: "true"
111ENABLE_MHC_CONTROL_PLANE: true
112ENABLE_MHC_WORKER_NODE: true
113MHC_UNKNOWN_STATUS_TIMEOUT: 5m
114MHC_FALSE_STATUS_TIMEOUT: 12m
115
116#! ---------------------------------------------------------------------
117#! Identity management configuration
118#! ---------------------------------------------------------------------
119
120IDENTITY_MANAGEMENT_TYPE: none

All the configs above should match the datacenter 1 environment so the TKG management cluster can be deployed. Lets deploy it using Tanzu CLI from my TKG bootstrap client:

1tanzu mc create -f tkg-mgmt-bootstrap.yaml

As soon as it is deployed grab the k8s config and add it to your context:

1tanzu mc kubeconfig get --admin --export-file stc-tkgm-mgmt-cluster.yaml

The IP address used for the Kubernetes API endpoint is the controlplane IP defined above:

1VSPHERE_CONTROL_PLANE_ENDPOINT: 10.13.20.100

We can also see this IP being assigned to my one controlplane node in the vCenter view:

kube-vip-assigned

Now just have a quick look inside the TKG mgmt cluster and specifically after AKO and eventual ADC:

 1tkg-bootstrap-vm:~/Kubernetes-library/examples/ingress$ k get pods -A
 2NAMESPACE                           NAME                                                             READY   STATUS    RESTARTS     AGE
 3avi-system                          ako-0                                                            1/1     Running   0            8h
 4capi-kubeadm-bootstrap-system       capi-kubeadm-bootstrap-controller-manager-5fb8fbc6c7-rqkzf       1/1     Running   0            8h
 5capi-kubeadm-control-plane-system   capi-kubeadm-control-plane-controller-manager-78c559f48c-cj2dm   1/1     Running   0            8h
 6capi-system                         capi-controller-manager-84fbb669c-bhk4j                          1/1     Running   0            8h
 7capv-system                         capv-controller-manager-5f46567b86-pccf5                         1/1     Running   0            8h
 8cert-manager                        cert-manager-5d8d7b4dfb-gj6h2                                    1/1     Running   0            9h
 9cert-manager                        cert-manager-cainjector-7797ff666f-zxh5l                         1/1     Running   0            9h
10cert-manager                        cert-manager-webhook-59969cbb8c-vpsgr                            1/1     Running   0            9h
11kube-system                         antrea-agent-6xzvh                                               2/2     Running   0            8h
12kube-system                         antrea-agent-gsfhc                                               2/2     Running   0            8h
13kube-system                         antrea-agent-t5gzb                                               2/2     Running   0            8h
14kube-system                         antrea-controller-74b468c659-hcrgp                               1/1     Running   0            8h
15kube-system                         coredns-5d4666ccfb-qx5qt                                         1/1     Running   0            9h
16kube-system                         coredns-5d4666ccfb-xj47b                                         1/1     Running   0            9h
17kube-system                         etcd-tkg-stc-mgmt-cluster-sbptz-lkn58                            1/1     Running   0            9h
18kube-system                         kube-apiserver-tkg-stc-mgmt-cluster-sbptz-lkn58                  1/1     Running   0            9h
19kube-system                         kube-controller-manager-tkg-stc-mgmt-cluster-sbptz-lkn58         1/1     Running   0            9h
20kube-system                         kube-proxy-9d7b9                                                 1/1     Running   0            9h
21kube-system                         kube-proxy-kd8h8                                                 1/1     Running   0            9h
22kube-system                         kube-proxy-n7zwx                                                 1/1     Running   0            9h
23kube-system                         kube-scheduler-tkg-stc-mgmt-cluster-sbptz-lkn58                  1/1     Running   0            9h
24kube-system                         kube-vip-tkg-stc-mgmt-cluster-sbptz-lkn58                        1/1     Running   0            9h
25kube-system                         metrics-server-b468f4d5f-hvtbg                                   1/1     Running   0            8h
26kube-system                         vsphere-cloud-controller-manager-fnsvh                           1/1     Running   0            8h
27secretgen-controller                secretgen-controller-697cb6c657-lh9rr                            1/1     Running   0            8h
28tanzu-auth                          tanzu-auth-controller-manager-d75d85899-d8699                    1/1     Running   0            8h
29tkg-system-networking               ako-operator-controller-manager-5bbb9d4c4b-2bjsk                 1/1     Running   0            8h
30tkg-system                          kapp-controller-9f9f578c7-dpzgk                                  2/2     Running   0            9h
31tkg-system                          object-propagation-controller-manager-5cbb94894f-k56w5           1/1     Running   0            8h
32tkg-system                          tanzu-addons-controller-manager-79f656b4c7-m72xw                 1/1     Running   0            8h
33tkg-system                          tanzu-capabilities-controller-manager-5868c5f789-nbkgm           1/1     Running   0            8h
34tkg-system                          tanzu-featuregates-controller-manager-6d567fffd6-647s5           1/1     Running   0            8h
35tkg-system                          tkr-conversion-webhook-manager-6977bfc965-gjjbt                  1/1     Running   0            8h
36tkg-system                          tkr-resolver-cluster-webhook-manager-5c8484ffd8-8xc8n            1/1     Running   0            8h
37tkg-system                          tkr-source-controller-manager-57c56d55d9-x6vsz                   1/1     Running   0            8h
38tkg-system                          tkr-status-controller-manager-55b4b845b9-77snb                   1/1     Running   0            8h
39tkg-system                          tkr-vsphere-resolver-webhook-manager-6476749d5d-5pxlk            1/1     Running   0            8h
40vmware-system-csi                   vsphere-csi-controller-585bf4dc75-wtlw2                          7/7     Running   0            8h
41vmware-system-csi                   vsphere-csi-node-ldrs6                                           3/3     Running   2 (8h ago)   8h
42vmware-system-csi                   vsphere-csi-node-rwgpw                                           3/3     Running   4 (8h ago)   8h
43vmware-system-csi                   vsphere-csi-node-rx8f6                                           3/3     Running   4 (8h ago)   8h

There is an AKO pod running. Are there any ADCs created?

1tkg-bootstrap-vm:~/Kubernetes-library/examples/ingress$ k get adc
2NAME                                 AGE
3install-ako-for-all                  8h
4install-ako-for-management-cluster   8h

Lets have a look inside both of them, first out install-aka-for-all and then install-ako-for-management-cluster

 1# Please edit the object below. Lines beginning with a '#' will be ignored,
 2# and an empty file will abort the edit. If an error occurs while saving this file will be
 3# reopened with the relevant failures.
 4#
 5apiVersion: networking.tkg.tanzu.vmware.com/v1alpha1
 6kind: AKODeploymentConfig
 7metadata:
 8  annotations:
 9    kapp.k14s.io/identity: v1;/networking.tkg.tanzu.vmware.com/AKODeploymentConfig/install-ako-for-all;networking.tkg.tanzu.vmware.com/v1alpha1
10    kapp.k14s.io/original: '{"apiVersion":"networking.tkg.tanzu.vmware.com/v1alpha1","kind":"AKODeploymentConfig","metadata":{"labels":{"kapp.k14s.io/app":"1685446688101132090","kapp.k14s.io/association":"v1.8329c3602ed02133e324fc22d58dcf28"},"name":"install-ako-for-all"},"spec":{"adminCredentialRef":{"name":"avi-controller-credentials","namespace":"tkg-system-networking"},"certificateAuthorityRef":{"name":"avi-controller-ca","namespace":"tkg-system-networking"},"cloudName":"stc-nsx-cloud","clusterSelector":{"matchLabels":{"adc-enabled":"true"}},"controlPlaneNetwork":{"cidr":"10.13.101.0/24","name":"vip-tkg-mgmt-l7"},"controller":"172.24.3.50","dataNetwork":{"cidr":"10.13.103.0/24","name":"vip-tkg-wld-l7"},"extraConfigs":{"disableStaticRouteSync":false,"ingress":{"defaultIngressController":false,"disableIngressClass":true,"nodeNetworkList":[{"networkName":"ls-tkg-mgmt"}]},"networksConfig":{"nsxtT1LR":"Tier-1"}},"serviceEngineGroup":"stc-nsx"}}'
11    kapp.k14s.io/original-diff-md5: c6e94dc94aed3401b5d0f26ed6c0bff3
12  creationTimestamp: "2023-05-30T11:38:45Z"
13  finalizers:
14  - ako-operator.networking.tkg.tanzu.vmware.com
15  generation: 2
16  labels:
17    kapp.k14s.io/app: "1685446688101132090"
18    kapp.k14s.io/association: v1.8329c3602ed02133e324fc22d58dcf28
19  name: install-ako-for-all
20  resourceVersion: "4686"
21  uid: 0cf0dd57-b193-40d5-bb03-347879157377
22spec:
23  adminCredentialRef:
24    name: avi-controller-credentials
25    namespace: tkg-system-networking
26  certificateAuthorityRef:
27    name: avi-controller-ca
28    namespace: tkg-system-networking
29  cloudName: stc-nsx-cloud
30  clusterSelector:
31    matchLabels:
32      adc-enabled: "true"
33  controlPlaneNetwork:
34    cidr: 10.13.101.0/24
35    name: vip-tkg-mgmt-l7
36  controller: 172.24.3.50
37  controllerVersion: 22.1.3
38  dataNetwork:
39    cidr: 10.13.103.0/24
40    name: vip-tkg-wld-l7
41  extraConfigs:
42    disableStaticRouteSync: false
43    ingress:
44      defaultIngressController: false
45      disableIngressClass: true
46      nodeNetworkList:
47      - networkName: ls-tkg-mgmt
48    networksConfig:
49      nsxtT1LR: Tier-1
50  serviceEngineGroup: stc-nsx

This is clearly configured for my datacenter 1, and will not match my datacenter 2 environment. Also notice the label, if I do create cluster and apply this label I will get the "default" ADC applied which will not match what I have to use in datacenter 2.

Lets have a look at the last one:

 1# Please edit the object below. Lines beginning with a '#' will be ignored,
 2# and an empty file will abort the edit. If an error occurs while saving this file will be
 3# reopened with the relevant failures.
 4#
 5apiVersion: networking.tkg.tanzu.vmware.com/v1alpha1
 6kind: AKODeploymentConfig
 7metadata:
 8  annotations:
 9    kapp.k14s.io/identity: v1;/networking.tkg.tanzu.vmware.com/AKODeploymentConfig/install-ako-for-management-cluster;networking.tkg.tanzu.vmware.com/v1alpha1
10    kapp.k14s.io/original: '{"apiVersion":"networking.tkg.tanzu.vmware.com/v1alpha1","kind":"AKODeploymentConfig","metadata":{"labels":{"kapp.k14s.io/app":"1685446688101132090","kapp.k14s.io/association":"v1.3012c3c8e0fa37b13f4916c7baca1863"},"name":"install-ako-for-management-cluster"},"spec":{"adminCredentialRef":{"name":"avi-controller-credentials","namespace":"tkg-system-networking"},"certificateAuthorityRef":{"name":"avi-controller-ca","namespace":"tkg-system-networking"},"cloudName":"stc-nsx-cloud","clusterSelector":{"matchLabels":{"cluster-role.tkg.tanzu.vmware.com/management":""}},"controlPlaneNetwork":{"cidr":"10.13.101.0/24","name":"vip-tkg-mgmt-l7"},"controller":"172.24.3.50","dataNetwork":{"cidr":"10.13.101.0/24","name":"vip-tkg-mgmt-l7"},"extraConfigs":{"disableStaticRouteSync":false,"ingress":{"defaultIngressController":false,"disableIngressClass":true,"nodeNetworkList":[{"networkName":"ls-tkg-mgmt"}]},"networksConfig":{"nsxtT1LR":"Tier-1"}},"serviceEngineGroup":"tkgm-se-group"}}'
11    kapp.k14s.io/original-diff-md5: c6e94dc94aed3401b5d0f26ed6c0bff3
12  creationTimestamp: "2023-05-30T11:38:45Z"
13  finalizers:
14  - ako-operator.networking.tkg.tanzu.vmware.com
15  generation: 2
16  labels:
17    kapp.k14s.io/app: "1685446688101132090"
18    kapp.k14s.io/association: v1.3012c3c8e0fa37b13f4916c7baca1863
19  name: install-ako-for-management-cluster
20  resourceVersion: "4670"
21  uid: c41e6e39-2b0f-4fa4-9245-0eec1bcf6b5d
22spec:
23  adminCredentialRef:
24    name: avi-controller-credentials
25    namespace: tkg-system-networking
26  certificateAuthorityRef:
27    name: avi-controller-ca
28    namespace: tkg-system-networking
29  cloudName: stc-nsx-cloud
30  clusterSelector:
31    matchLabels:
32      cluster-role.tkg.tanzu.vmware.com/management: ""
33  controlPlaneNetwork:
34    cidr: 10.13.101.0/24
35    name: vip-tkg-mgmt-l7
36  controller: 172.24.3.50
37  controllerVersion: 22.1.3
38  dataNetwork:
39    cidr: 10.13.101.0/24
40    name: vip-tkg-mgmt-l7
41  extraConfigs:
42    disableStaticRouteSync: false
43    ingress:
44      defaultIngressController: false
45      disableIngressClass: true
46      nodeNetworkList:
47      - networkName: ls-tkg-mgmt
48    networksConfig:
49      nsxtT1LR: Tier-1
50  serviceEngineGroup: tkgm-se-group

The same is true for this one. Configured for my datacenter 1, only major difference being a different dataNetwork. So if I decide to deploy a workload cluster in the same datacenter it would be fine, but I dont want that I want my workload cluster to be in a different datacenter.. Lets do that..

TKG workload cluster deployment - with corresponding ADC

Before I deploy my workload cluster I will create a "custom" ADC specific for the datacenter 2 where I will deploy the workload cluster. Lets paste my ADC for the dc 2 workload cluster:

 1apiVersion: networking.tkg.tanzu.vmware.com/v1alpha1
 2kind: AKODeploymentConfig
 3metadata:
 4  name: ako-tkg-wdc-cloud
 5spec:
 6  adminCredentialRef:
 7    name: avi-controller-credentials
 8    namespace: tkg-system-networking
 9  certificateAuthorityRef:
10    name: avi-controller-ca
11    namespace: tkg-system-networking
12  cloudName: wdc-nsx-cloud
13  clusterSelector:
14    matchLabels:
15      avi-cloud: "wdc-nsx-cloud"
16  controller: 172.24.3.50
17  dataNetwork:
18    cidr: 10.101.221.0/24
19    name: tkg-wld-1-apps
20  extraConfigs:
21    cniPlugin: antrea
22    disableStaticRouteSync: false                               # required
23    ingress:
24      defaultIngressController: true
25      disableIngressClass: false                                # required
26      nodeNetworkList:                                          # required
27        - cidrs:
28          - 10.101.13.0/24
29          networkName: ls-tkg-wdc-wld-1
30      serviceType: NodePortLocal                                # required
31      shardVSSize: SMALL                                        # required
32    l4Config:
33      autoFQDN: default
34    networksConfig:
35      nsxtT1LR: /infra/tier-1s/Tier-1
36  serviceEngineGroup: wdc-se-group

In this ADC I will configure the dataNetwork for the VIP network I have defined in the NSX ALB DC 2 cloud, pointing to the NSX-T Tier1 (yes they have the same name as in DC1, but they are not the same), nodeNetworkList matching where my workload cluster nodes will be placed in DC 2. And also notice the label, for my workload cluster to use this ADC I will need to apply this label either during provisioning or label it after creation. Apply the ADC:

1k apply -f ako-wld-cluster-1.wdc.cloud.yaml

Is it there:

1amarqvardsen@amarqvards1MD6T:~/Kubernetes-library/tkgm/stc-tkgm$ k get adc
2NAME                                 AGE
3ako-tkg-wdc-cloud                    20s
4install-ako-for-all                  9h
5install-ako-for-management-cluster   9h

Yes it is.

Now prepare the TKG workload cluster manifest to match the DC 2 environment and apply it, also making sure aviAPIServerHAProvider is set to false.

  1apiVersion: cpi.tanzu.vmware.com/v1alpha1
  2kind: VSphereCPIConfig
  3metadata:
  4  name: wdc-tkgm-wld-cluster-1
  5  namespace: ns-wdc-1
  6spec:
  7  vsphereCPI:
  8    ipFamily: ipv4
  9    mode: vsphereCPI
 10    tlsCipherSuites: TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
 11    vmNetwork:
 12      excludeExternalSubnetCidr: 10.101.13.100/32
 13      excludeInternalSubnetCidr: 10.101.13.100/32
 14---
 15apiVersion: csi.tanzu.vmware.com/v1alpha1
 16kind: VSphereCSIConfig
 17metadata:
 18  name: wdc-tkgm-wld-cluster-1
 19  namespace: ns-wdc-1
 20spec:
 21  vsphereCSI:
 22    config:
 23      datacenter: /cPod-NSXAM-WDC
 24      httpProxy: ""
 25      httpsProxy: ""
 26      noProxy: ""
 27      region: null
 28      tlsThumbprint: vCenter SHA-1 of DC2
 29      useTopologyCategories: false
 30      zone: null
 31    mode: vsphereCSI
 32---
 33apiVersion: run.tanzu.vmware.com/v1alpha3
 34kind: ClusterBootstrap
 35metadata:
 36  annotations:
 37    tkg.tanzu.vmware.com/add-missing-fields-from-tkr: v1.25.7---vmware.2-tkg.1
 38  name: wdc-tkgm-wld-cluster-1
 39  namespace: ns-wdc-1
 40spec:
 41  additionalPackages:
 42  - refName: metrics-server*
 43  - refName: secretgen-controller*
 44  - refName: pinniped*
 45  cpi:
 46    refName: vsphere-cpi*
 47    valuesFrom:
 48      providerRef:
 49        apiGroup: cpi.tanzu.vmware.com
 50        kind: VSphereCPIConfig
 51        name: wdc-tkgm-wld-cluster-1
 52  csi:
 53    refName: vsphere-csi*
 54    valuesFrom:
 55      providerRef:
 56        apiGroup: csi.tanzu.vmware.com
 57        kind: VSphereCSIConfig
 58        name: wdc-tkgm-wld-cluster-1
 59  kapp:
 60    refName: kapp-controller*
 61---
 62apiVersion: v1
 63kind: Secret
 64metadata:
 65  name: wdc-tkgm-wld-cluster-1
 66  namespace: ns-wdc-1
 67stringData:
 68  password: password vCenter User
 69  username: user@vcenter.net
 70---
 71apiVersion: cluster.x-k8s.io/v1beta1
 72kind: Cluster
 73metadata:
 74  annotations:
 75    osInfo: ubuntu,20.04,amd64
 76    tkg.tanzu.vmware.com/cluster-controlplane-endpoint: 10.101.13.100 #here is the VIP for the workload k8s API - by Kube-VIP
 77    tkg/plan: dev
 78  labels:
 79    tkg.tanzu.vmware.com/cluster-name: wdc-tkgm-wld-cluster-1
 80    avi-cloud: "wdc-nsx-cloud"
 81  name: wdc-tkgm-wld-cluster-1
 82  namespace: ns-wdc-1
 83spec:
 84  clusterNetwork:
 85    pods:
 86      cidrBlocks:
 87      - 20.10.0.0/16
 88    services:
 89      cidrBlocks:
 90      - 20.20.0.0/16
 91  topology:
 92    class: tkg-vsphere-default-v1.0.0
 93    controlPlane:
 94      metadata:
 95        annotations:
 96          run.tanzu.vmware.com/resolve-os-image: image-type=ova,os-name=ubuntu
 97      replicas: 1
 98    variables:
 99    - name: cni
100      value: antrea
101    - name: controlPlaneCertificateRotation
102      value:
103        activate: true
104        daysBefore: 90
105    - name: auditLogging
106      value:
107        enabled: false
108    - name: apiServerPort
109      value: 6443
110    - name: podSecurityStandard
111      value:
112        audit: baseline
113        deactivated: false
114        warn: baseline
115    - name: apiServerEndpoint
116      value: 10.101.13.100 #Here is the K8s API endpoint provided by Kube-VIP
117    - name: aviAPIServerHAProvider
118      value: false
119    - name: vcenter
120      value:
121        cloneMode: fullClone
122        datacenter: /cPod-NSXAM-WDC
123        datastore: /cPod-NSXAM-WDC/datastore/vsanDatastore-wdc-01
124        folder: /cPod-NSXAM-WDC/vm/TKGm
125        network: /cPod-NSXAM-WDC/network/ls-tkg-wdc-wld-1
126        resourcePool: /cPod-NSXAM-WDC/host/Cluster-1/Resources
127        server: vcsa.cpod-nsxam-wdc.az-wdc.cloud-garage.net
128        storagePolicyID: ""
129        template: /cPod-NSXAM-WDC/vm/ubuntu-2004-efi-kube-v1.25.7+vmware.2
130        tlsThumbprint: vCenter SHA1 DC2
131    - name: user
132      value:
133        sshAuthorizedKeys:
134        - ssh-rsa public key
135    - name: controlPlane
136      value:
137        machine:
138          diskGiB: 20
139          memoryMiB: 4096
140          numCPUs: 2
141    - name: worker
142      value:
143        count: 2
144        machine:
145          diskGiB: 20
146          memoryMiB: 4096
147          numCPUs: 2
148    version: v1.25.7+vmware.2-tkg.1
149    workers:
150      machineDeployments:
151      - class: tkg-worker
152        metadata:
153          annotations:
154            run.tanzu.vmware.com/resolve-os-image: image-type=ova,os-name=ubuntu
155        name: md-0
156        replicas: 2

Notice the label, it should match what you defined in the ADC.. avi-cloud: "wdc-nsx-cloud"

Now apply the cluster.

1tanzu cluster create -f wdc-tkg-wld-cluster-1.yaml

After a cup of coffee it should be ready.

Lets check the cluster from the mgmt cluster context:

1linux-vm:~/Kubernetes-library/tkgm/stc-tkgm$ k get cluster -n ns-wdc-1 wdc-tkgm-wld-cluster-1
2NAME                     PHASE         AGE   VERSION
3wdc-tkgm-wld-cluster-1   Provisioned   13m    v1.25.7+vmware.2

Grab the k8s config and switch to the workload cluster context.

1tanzu cluster kubeconfig get wdc-tkgm-wld-cluster-1 --namespace ns-wdc-1 --admin --export-file wdc-tkgm-wld-cluster-1-k8s-config.yaml

Check the nodes:

1linux-vm:~/Kubernetes-library/tkgm/stc-tkgm$ k get nodes -o wide
2NAME                                                STATUS   ROLES           AGE   VERSION            INTERNAL-IP    EXTERNAL-IP    OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
3wdc-tkgm-wld-cluster-1-md-0-jkcjw-d7795fbb5-gxnpc   Ready    <none>          9h    v1.25.7+vmware.2   10.101.13.26   10.101.13.26   Ubuntu 20.04.6 LTS   5.4.0-144-generic   containerd://1.6.18-1-gdbc99e5b1
4wdc-tkgm-wld-cluster-1-md-0-jkcjw-d7795fbb5-ph5j4   Ready    <none>          9h    v1.25.7+vmware.2   10.101.13.42   10.101.13.42   Ubuntu 20.04.6 LTS   5.4.0-144-generic   containerd://1.6.18-1-gdbc99e5b1
5wdc-tkgm-wld-cluster-1-s49w2-mz85r                  Ready    control-plane   9h    v1.25.7+vmware.2   10.101.13.41   10.101.13.41   Ubuntu 20.04.6 LTS   5.4.0-144-generic   containerd://1.6.18-1-gdbc99e5b1

In my vCenter in DC 2?

tkg-workload-dc-2

Lets see them running pods:

 1linux-vm:~/Kubernetes-library/tkgm/stc-tkgm$ k get pods -A
 2NAMESPACE              NAME                                                         READY   STATUS    RESTARTS     AGE
 3avi-system             ako-0                                                        1/1     Running   0            8h
 4kube-system            antrea-agent-rhmsn                                           2/2     Running   0            8h
 5kube-system            antrea-agent-ttk6f                                           2/2     Running   0            8h
 6kube-system            antrea-agent-tw6t7                                           2/2     Running   0            8h
 7kube-system            antrea-controller-787994578b-7v2cl                           1/1     Running   0            8h
 8kube-system            coredns-5d4666ccfb-b2j85                                     1/1     Running   0            8h
 9kube-system            coredns-5d4666ccfb-lr97g                                     1/1     Running   0            8h
10kube-system            etcd-wdc-tkgm-wld-cluster-1-s49w2-mz85r                      1/1     Running   0            8h
11kube-system            kube-apiserver-wdc-tkgm-wld-cluster-1-s49w2-mz85r            1/1     Running   0            8h
12kube-system            kube-controller-manager-wdc-tkgm-wld-cluster-1-s49w2-mz85r   1/1     Running   0            8h
13kube-system            kube-proxy-9m2zh                                             1/1     Running   0            8h
14kube-system            kube-proxy-rntv8                                             1/1     Running   0            8h
15kube-system            kube-proxy-t7z49                                             1/1     Running   0            8h
16kube-system            kube-scheduler-wdc-tkgm-wld-cluster-1-s49w2-mz85r            1/1     Running   0            8h
17kube-system            kube-vip-wdc-tkgm-wld-cluster-1-s49w2-mz85r                  1/1     Running   0            8h
18kube-system            metrics-server-c6d9969cb-7h5l7                               1/1     Running   0            8h
19kube-system            vsphere-cloud-controller-manager-b2rkl                       1/1     Running   0            8h
20secretgen-controller   secretgen-controller-cd678b84c-cdntv                         1/1     Running   0            8h
21tkg-system             kapp-controller-6c5dfccc45-7nhl5                             2/2     Running   0            8h
22tkg-system             tanzu-capabilities-controller-manager-5bf587dcd5-fp6t9       1/1     Running   0            8h
23vmware-system-csi      vsphere-csi-controller-5459886d8c-5jzlz                      7/7     Running   0            8h
24vmware-system-csi      vsphere-csi-node-6pfbj                                       3/3     Running   4 (8h ago)   8h
25vmware-system-csi      vsphere-csi-node-cbcpm                                       3/3     Running   4 (8h ago)   8h
26vmware-system-csi      vsphere-csi-node-knk8q                                       3/3     Running   2 (8h ago)   8h

There is an AKO pod running. So far so good. How does the AKO configmap look like?

 1# Please edit the object below. Lines beginning with a '#' will be ignored,
 2# and an empty file will abort the edit. If an error occurs while saving this file will be
 3# reopened with the relevant failures.
 4#
 5apiVersion: v1
 6data:
 7  apiServerPort: "8080"
 8  autoFQDN: default
 9  cloudName: wdc-nsx-cloud
10  clusterName: ns-wdc-1-wdc-tkgm-wld-cluster-1
11  cniPlugin: antrea
12  controllerIP: 172.24.3.50
13  controllerVersion: 22.1.3
14  defaultIngController: "true"
15  deleteConfig: "false"
16  disableStaticRouteSync: "false"
17  fullSyncFrequency: "1800"
18  logLevel: INFO
19  nodeNetworkList: '[{"networkName":"ls-tkg-wdc-wld-1","cidrs":["10.101.13.0/24"]}]'
20  nsxtT1LR: /infra/tier-1s/Tier-1
21  serviceEngineGroupName: wdc-se-group
22  serviceType: NodePortLocal
23  shardVSSize: SMALL
24  useDefaultSecretsOnly: "false"
25  vipNetworkList: '[{"networkName":"tkg-wld-1-apps","cidr":"10.101.221.0/24"}]'
26kind: ConfigMap
27metadata:
28  annotations:
29    kapp.k14s.io/identity: v1;avi-system//ConfigMap/avi-k8s-config;v1
30    kapp.k14s.io/original: '{"apiVersion":"v1","data":{"apiServerPort":"8080","autoFQDN":"default","cloudName":"wdc-nsx-cloud","clusterName":"ns-wdc-1-wdc-tkgm-wld-cluster-1","cniPlugin":"antrea","controllerIP":"172.24.3.50","controllerVersion":"22.1.3","defaultIngController":"true","deleteConfig":"false","disableStaticRouteSync":"false","fullSyncFrequency":"1800","logLevel":"INFO","nodeNetworkList":"[{\"networkName\":\"ls-tkg-wdc-wld-1\",\"cidrs\":[\"10.101.13.0/24\"]}]","nsxtT1LR":"/infra/tier-1s/Tier-1","serviceEngineGroupName":"wdc-se-group","serviceType":"NodePortLocal","shardVSSize":"SMALL","useDefaultSecretsOnly":"false","vipNetworkList":"[{\"networkName\":\"tkg-wld-1-apps\",\"cidr\":\"10.101.221.0/24\"}]"},"kind":"ConfigMap","metadata":{"labels":{"kapp.k14s.io/app":"1685448627039212099","kapp.k14s.io/association":"v1.ae838cced3b6caccc5a03bfb3ae65cd7"},"name":"avi-k8s-config","namespace":"avi-system"}}'
31    kapp.k14s.io/original-diff-md5: c6e94dc94aed3401b5d0f26ed6c0bff3
32  creationTimestamp: "2023-05-30T12:10:34Z"
33  labels:
34    kapp.k14s.io/app: "1685448627039212099"
35    kapp.k14s.io/association: v1.ae838cced3b6caccc5a03bfb3ae65cd7
36  name: avi-k8s-config
37  namespace: avi-system
38  resourceVersion: "2456"
39  uid: 81bbe809-f5a1-45b8-aef2-a83ff36a3dd1

That looks good, the question now will it blend?

So far I dont have anything in my NSX ALB dashboard. What happens then if I create some servicetype loadBalancer or Ingresses? Lets have a look. First check, do I have an IngressClass?

1linux-vm:~/Kubernetes-library/tkgm/stc-tkgm$ k get ingressclasses.networking.k8s.io
2NAME     CONTROLLER              PARAMETERS   AGE
3avi-lb   ako.vmware.com/avi-lb   <none>       10m

I certainly do. I will now deploy an application with serviceType loadBalancer and an application using Ingress.

Here they are:

1linux-vm:~/Kubernetes-library/tkgm/stc-tkgm$ k get svc -n yelb
2NAME             TYPE           CLUSTER-IP      EXTERNAL-IP     PORT(S)        AGE
3redis-server     ClusterIP      20.20.103.179   <none>          6379/TCP       10s
4yelb-appserver   ClusterIP      20.20.242.204   <none>          4567/TCP       10s
5yelb-db          ClusterIP      20.20.202.153   <none>          5432/TCP       10s
6yelb-ui          LoadBalancer   20.20.123.18    10.101.221.10   80:30119/TCP   10s
1linux-vm:~/Kubernetes-library/tkgm/stc-tkgm$ k get ingress -n fruit
2NAME              CLASS    HOSTS                                 ADDRESS         PORTS   AGE
3ingress-example   avi-lb   fruit-tkg.you-have.your-domain.here   10.101.221.11   80      50s

Looking at the Ingres I can also see NSX ALB has been so kind to register DNS record for it also.

How does it look inside my NSX ALB in DC1?

nsx-alb-applications

And where are the above NSX ALB Service Engines deployed?

nsx-alb-se-dc-2

In my vCenter in DC-2.

Well that was it. Thanks for reading.