Tanzu Kubernetes Grid 2.2 & Remote Workload Clusters
Overview
Overview of what this post is about
In this post my goal is to use TKG (Tanzu Kubernetes Grid here) to manage and deploy workload clusters in remote datacenter's. The reason for such needs can be many like easier lifecycle management of workload clusters in environments where we have several datacenters, with different physical locations. To lower the management overhead and simplify lifecycle management, like updates, being able to centrally manage these operations is sometimes key.
So in this post I have two datacenters, with vSphere clusters each being managed by their own vCenter Server. NSX is installed in both datacenters managing the network in their respective datacenter. NSX Advanced Loadbalancer (Avi) is deployed in datacenter 1 and is managing both datacenters (this post will not cover GSLB and having Avi controllers deployed in both datacenters and why this also could be a smart thing to consider). This Avi installation will be responsible for creating SE's (Service Engines) and virtual services for the datacenter it is deployed in as well as the remote datacenter (datacenter 2). This will be the only controller/manager that is being shared across the two datacenters.
So trying to illustrate the above with the following diagram:
I will not go into how connectivity between these sites are established, I just assume that relevant connectivity between the sites are in place (as that is a requirement for this to work).
Meet the "moving parts" involved
This section will quickly go through the components being used in their different datacenters. First out is the components used in the datacenter 1 environment.
Datacenter 1
In datacenter 1 I have one vSphere cluster consisting of four ESXi hosts being managed by a vCenter server. This datacenter will be my managment datacenter where I will deploy my TKG management cluster. It is placed in physical location 1 with its own physical network and storage (using vSAN).
This is vSphere cluster in datacenter 1:
In datacenter 1 I have also installed NSX-T to handle all the networking needs in this datacenter. There is no stretching of networks between the datacenters, the NSX environment is only responsible for the same datacenter it is installed in as you can see below:
It also has a 1:1 relationship to the vCenter Server in datacenter 1:
This NSX environment has created the following networks to support my TKG managment cluster deployment:
And from the vCenter:
I will quickly just describe the different networks:
- ls-avi-dns-se-data: This is where I place the dataplane of my SE's used for the DNS service in Avi
- ls-avi-se-data: This is where I place the dataplane of my SE's used for my other virtual services (regular application needs, or if I happen to deploy services in my TKG mgmt cluster or a workload cluster in the same datacenter.) This network will not be used in this post.
- ls-mgmt: is where I place the management interface of my SE's.
- ls-tkg-mgmt: This will be used in this post and is where my TKG management cluster nodes will be placed.
The ls-tkg-mgmt network has also been configured with DHCP on the segment in NSX:
And last but not least component, the Avi controller.
This is the component in this post that has been configured to handle requests from both datacenters as a shared resource, whether it is regular layer-4 services like servicetype loadBalancer or layer-7 services like Ingress. As both datacenters are being managed by their own NSX-T I have configured the Avi controller to use both NSX-T environments as two different clouds:
Each cloud depicted above is reflecting the two different datacenters and have been configured accordingly to support the network settings in each datacenter respectively.
Each cloud is a NSX-T cloud and have its own unique configurations that matches the configuration for the respective datacenter the cloud is in. Networks, IPAM/DNS profiles, routing contexts, service engine groups. Below is some screenshots from the Avi controller:
Service engine groups in the stc-nsx-cloud:
The above SE groups have been configured for placement in their respective vSphere clusters, folder, naming, datastore etc.
The above networks have been configured to provision IP addresses using Avi IPAM to automate SE's dataplane creation.
The below networks are the vip networks configured in the stc-nsx-cloud:
Then the routing context (or VRF context) for the SE's to reach the backend
The same has been done for the wdc-nsx-cloud. I will not print them here, but just show that there's is also a wdc-cloud configured in these sections also:
Notice the difference in IP subnets.
Then its the IPAM DNS profiles for both clouds:
Instead of going into too many details how to configure Avi, its all about to configure it to support the infrastructure settings in each datacenter. Then when the requests for virtual services come to the Avi controller it knows how to handle the requests and create the virtual services, the service engines and do the ip addressing correctly. Then this part will just work butter smooth.
An overview of the components in datacenter 1:
Datacenter 2
In datacenter 2 I have also one vSphere cluster consisting of four ESXi hosts being managed by a vCenter server. This datacenter will be my remote/edge datacenter where I will deploy my TKG workload clusters. It is placed in physical location 2 with its own physical network and storage (using vSAN).
This is vSphere cluster in datacenter 2:
In datacenter 2 I have also installed NSX-T to handle all the networking needs in this datacenter. As mentioned above, there is no stretching of networks between the datacenters, the NSX environments is only responsible for the same datacenter it is installed in as you can see below:
It also has a 1:1 relationship to the vCenter Server in datacenter 1:
This NSX environment has created the following networks to support my TKG managment cluster deployment:
And from the vCenter:
I will quickly just describe the different networks:
- ls-avi-dns-se-data: This is where I place the dataplane of my SE's used for the DNS service in Avi
- ls-avi-generic-se-data: This is where I place the dataplane of my SE's used for the virtual services created when I expose services from the workload clusters. This network will be used in this post.
- ls-mgmt: is where I place the management interface of my SE's.
- ls-tkg-wdc-wlc-1: This will be used as placement for my TKG workload cluster nodes in this datacenter.
The ls-tkg-wdc-wlc-1 network has also been configured with DHCP on the segment in NSX:
An overview again of the components in DC2:
Thats it for the "moving parts" involved in both datacenters for this practice.
TKG management cluster deployment
Now finally for the fun parts. Deployment. As I have mentioned in the previous chapters, I will deploy the TKG management cluster in datacenter 1. But before I do the actual deployment I will need to explain a little around how a TKG cluster is reached, whether its the management cluster or the workload clusters.
Kubernetes API endpoint - exposing services inside the kubernetes clusters (tkg clusters)
A Kubernetes cluster consist usually of 1 or 3 controlplane nodes. This is where the Kubernetes API endpoint lives. When interacting with Kubernetes we are using the exposed Kubernetes APIs to tell it declaratively (some say in a nice way) to realise something we want it to do. This api endpoint is usually exposed on port 6443, and will always be available on the control plane nodes, not on the worker nodes. So the first criteria to be met is connectivity to the control plane nodes on port 6443 (or ssh into the controlplane nodes themselves on port 22 and work with the kube-api from there, but not ideal). We want to reach the api from a remote workstation to be more flexible and effective in how we interact with the Kubernetes API. When having just 1 controlplane node it is probably just ok to reach this one controlplane node and send our api calls directly but with just one controlplane node this can create some issues down the road when we want to replace/upgrade this one node, it can change (most likely will) IP address. Meaning our kubeconfig context/automation tool needs to be updated accordingly. So what we want is a virtual ip address that will stay consistent across the lifetime of the Kubernetes cluster. The same is also when we have more than one controlplane node, 3 is a common number of controlplane nodes in production. We cant have an even number of controlplane nodes as we want quorum. We want to have 1 consistent IP address to reach either just the one controlplane node's Kubernetes API or 1 consistent IP address loadbalanced across all three controlplane nodes. To achieve that we need some kind of loadbalancer that can create this virtual ip address for us to expose the Kubernetes API consistently. In TKG we can use NSX Advanced Loabalancer for this purpose, or a simpler approach like Kube-VIP. I dont want to go into a big writeup on the difference between these two other than they are not comparable to each other. Kube-VIP will not loadbalance the Kubernetes API between the 3 control-plane nodes, it will just create a virtual ip in the same subnet as the controlplane nodes and be placed on one of the controlplane nodes, stay there until the node fails and move over to the other control-plane nodes. While NSX ALB will loadbalance the Kuberntes API endpoint between all three control-plane nodes and the IP address is automatically allocated on provisioning. Kube-VIP is statically assigned.
Why I am mentioning this? Why could I not just focus on NSX Advanced Loadbalancer that can cover all my needs? That is because in this specific post I am hitting a special use-case where I have my TKG management cluster placed in one datacenter managed by its own NSX-T, while I want to deploy and manage TKG workload clusters in a completely different datacenter also managed by its own NSX-T. By using NSX Advanced Loabalancer as my API endpoint VIP provider in combination with NSX-T Clouds (Avi Clouds) I am currently not allowed to override the control-plane network (API endpoint). It is currently not possible to override or select a different NSX-T Tier1 for the the control-plane network, as these are different due to two different NSX-T environments, I can name the Tier-1 routers identically in both datacenters, but its not so easily fooled 😄 So my option to work around this is to use Kube-VIP. Kube-VIP allows me to configure manually the API endpoint IP for my workload clusters. I will try to explain a bit more how the NSX ALB integration works in TKG below.
What about the services I want to expose from the different workload-clusters like servicetype loadBalancer and Ingress? That is a different story, there we can use NSX Advanced Loadbalancer as much as we want and in a very flexible way too. The reason for that is that the Kubernetes API endpoint VIP or controlplane network is something that is managed and controlled by the TKG management cluster while whats coming from the inside of a working TKG workload cluster is completely different. Using NSX Advanced Loadbalancer in TKG or in any other Kubernetes platform like native upstream Kubernetes we use a component called AKO (Avi Kubernetes Operator) that handles all the standard Kubernetes requests like servicetype loadBalancer and Ingress creation and forwards them to the NSX ALB controller to realize them. In TKG we have AKO running in the management cluster that is responsible for the services being exposed from inside the TKG management cluster, but also assigning the VIP for the workload clusters Kubernetes API (controlplane network). As soon as we have our first TKG workload cluster, this comes with its own AKO that is responsible for all the services in the workload cluster it runs in, it will not have anything to do with the controlplane network and the AKO running in the TKG management cluster. So we can actually adjust this AKO instance to match our needs there without being restricted to what the AKO instance in the TKG management cluster is configured with.
In a TKG workload cluster there is a couple of ways to get AKO installed. One option is to use the AKO Operator running in the TKG management cluster to deploy it automatically on TKG workload cluster provisioning. This approach is best if you want TKG to handle the lifecycle of the AKO instance, like upgrades and it is very hands-off. We need to define an AkoDeploymentConfig in the TKG management cluster that defines the AKO settings for the respective TKG workload cluster or clusters if they can share the same settings. This is based on labels so its very easy to create the ADC for a series of clusters or specific cluster by applying the correct label on the cluster. The other option is to install AKO via Helm, this gives you full flexibility but is a manual process that needs to be done on all TKG workload clusters that needs AKO installed. I tend to lean on the ADC approach as I cant see any limitation this approach has compared to the AKO via Helm approach. ADC also supports AviInfraSettings which gives you further flexibility and options.
With that out of the way let us get this TKG management cluster deployed already...
TKG management cluster deployment - continued
I will not cover any of the pre-reqs to deploy TKG, for that have a look here, I will just go straight to it. My TKG managment cluster bootstrap yaml manifest. Below I will paste my yaml for the TKG mgmt cluster with some comments that I have done to make use of Kube-VIP for the controlplane, aka Kubernetes API endpoint.
1#! ---------------
2#! Basic config
3#! -------------
4CLUSTER_NAME: tkg-stc-mgmt-cluster
5CLUSTER_PLAN: dev
6INFRASTRUCTURE_PROVIDER: vsphere
7ENABLE_CEIP_PARTICIPATION: "false"
8ENABLE_AUDIT_LOGGING: "false"
9CLUSTER_CIDR: 100.96.0.0/11
10SERVICE_CIDR: 100.64.0.0/13
11TKG_IP_FAMILY: ipv4
12DEPLOY_TKG_ON_VSPHERE7: "true"
13CLUSTER_API_SERVER_PORT: 6443 #Added for Kube-VIP
14VSPHERE_CONTROL_PLANE_ENDPOINT: 10.13.20.100 #Added for Kube-VIP - specify a static IP in same subnet as nodes
15VSPHERE_CONTROL_PLANE_ENDPOINT_PORT: 6443 #Added for Kube-VIP
16VIP_NETWORK_INTERFACE: "eth0" #Added for Kube-VIP
17# VSPHERE_ADDITIONAL_FQDN:
18AVI_CONTROL_PLANE_HA_PROVIDER: false #Set to false to use Kube-VIP instead
19AVI_ENABLE: "true" #I still want AKO to be installed, but not used for controplane endpoint
20
21#! ---------------
22#! vSphere config
23#! -------------
24VSPHERE_DATACENTER: /cPod-NSXAM-STC
25VSPHERE_DATASTORE: /cPod-NSXAM-STC/datastore/vsanDatastore
26VSPHERE_FOLDER: /cPod-NSXAM-STC/vm/TKGm
27VSPHERE_INSECURE: "false"
28VSPHERE_NETWORK: /cPod-NSXAM-STC/network/ls-tkg-mgmt
29VSPHERE_PASSWORD: "password"
30VSPHERE_RESOURCE_POOL: /cPod-NSXAM-STC/host/Cluster/Resources
31#VSPHERE_TEMPLATE: /Datacenter/vm/TKGm/ubuntu-2004-kube-v1.23.8+vmware.2
32VSPHERE_SERVER: vcsa.cpod-nsxam-stc.az-stc.cloud-garage.net
33VSPHERE_SSH_AUTHORIZED_KEY: ssh-rsa ssh-public key
34VSPHERE_TLS_THUMBPRINT: vcenter SHA1
35VSPHERE_USERNAME: username@domain.net
36
37#! ---------------
38#! Node config
39#! -------------
40OS_ARCH: amd64
41OS_NAME: ubuntu
42OS_VERSION: "20.04"
43VSPHERE_CONTROL_PLANE_DISK_GIB: "20"
44VSPHERE_CONTROL_PLANE_MEM_MIB: "4096"
45VSPHERE_CONTROL_PLANE_NUM_CPUS: "2"
46VSPHERE_WORKER_DISK_GIB: "20"
47VSPHERE_WORKER_MEM_MIB: "4096"
48VSPHERE_WORKER_NUM_CPUS: "2"
49CONTROL_PLANE_MACHINE_COUNT: 1
50WORKER_MACHINE_COUNT: 2
51
52#! ---------------
53#! Avi config
54#! -------------
55AVI_CA_DATA_B64: AVI Controller Base64 Certificate
56AVI_CLOUD_NAME: stc-nsx-cloud
57AVI_CONTROLLER: 172.24.3.50
58# Network used to place workload clusters' endpoint VIPs
59#AVI_CONTROL_PLANE_NETWORK: vip-tkg-wld-l4
60#AVI_CONTROL_PLANE_NETWORK_CIDR: 10.13.102.0/24
61# Network used to place workload clusters' services external IPs (load balancer & ingress services)
62AVI_DATA_NETWORK: vip-tkg-wld-l7
63AVI_DATA_NETWORK_CIDR: 10.13.103.0/24
64# Network used to place management clusters' services external IPs (load balancer & ingress services)
65AVI_MANAGEMENT_CLUSTER_VIP_NETWORK_CIDR: 10.13.101.0/24
66AVI_MANAGEMENT_CLUSTER_VIP_NETWORK_NAME: vip-tkg-mgmt-l7
67# Network used to place management clusters' endpoint VIPs
68#AVI_MANAGEMENT_CLUSTER_CONTROL_PLANE_VIP_NETWORK_NAME: vip-tkg-mgmt-l4
69#AVI_MANAGEMENT_CLUSTER_CONTROL_PLANE_VIP_NETWORK_CIDR: 10.13.100.0/24
70AVI_NSXT_T1LR: Tier-1
71AVI_CONTROLLER_VERSION: 22.1.2
72AVI_LABELS: "{adc-enabled: 'true'}" #Added so I can select easily which workload cluster that will use this AKO config
73AVI_PASSWORD: "password"
74AVI_SERVICE_ENGINE_GROUP: stc-nsx
75AVI_MANAGEMENT_CLUSTER_SERVICE_ENGINE_GROUP: tkgm-se-group
76AVI_USERNAME: admin
77AVI_DISABLE_STATIC_ROUTE_SYNC: false
78AVI_INGRESS_DEFAULT_INGRESS_CONTROLLER: true
79AVI_INGRESS_SHARD_VS_SIZE: SMALL
80AVI_INGRESS_SERVICE_TYPE: NodePortLocal
81
82#! ---------------
83#! Proxy config
84#! -------------
85TKG_HTTP_PROXY_ENABLED: "false"
86
87#! ---------------------------------------------------------------------
88#! Antrea CNI configuration
89#! ---------------------------------------------------------------------
90# ANTREA_NO_SNAT: false
91# ANTREA_TRAFFIC_ENCAP_MODE: "encap"
92# ANTREA_PROXY: false
93# ANTREA_POLICY: true
94# ANTREA_TRACEFLOW: false
95ANTREA_NODEPORTLOCAL: true
96ANTREA_PROXY: true
97ANTREA_ENDPOINTSLICE: true
98ANTREA_POLICY: true
99ANTREA_TRACEFLOW: true
100ANTREA_NETWORKPOLICY_STATS: false
101ANTREA_EGRESS: true
102ANTREA_IPAM: false
103ANTREA_FLOWEXPORTER: false
104ANTREA_SERVICE_EXTERNALIP: false
105ANTREA_MULTICAST: false
106
107#! ---------------------------------------------------------------------
108#! Machine Health Check configuration
109#! ---------------------------------------------------------------------
110ENABLE_MHC: "true"
111ENABLE_MHC_CONTROL_PLANE: true
112ENABLE_MHC_WORKER_NODE: true
113MHC_UNKNOWN_STATUS_TIMEOUT: 5m
114MHC_FALSE_STATUS_TIMEOUT: 12m
115
116#! ---------------------------------------------------------------------
117#! Identity management configuration
118#! ---------------------------------------------------------------------
119
120IDENTITY_MANAGEMENT_TYPE: none
All the configs above should match the datacenter 1 environment so the TKG management cluster can be deployed. Lets deploy it using Tanzu CLI from my TKG bootstrap client:
1tanzu mc create -f tkg-mgmt-bootstrap.yaml
As soon as it is deployed grab the k8s config and add it to your context:
1tanzu mc kubeconfig get --admin --export-file stc-tkgm-mgmt-cluster.yaml
The IP address used for the Kubernetes API endpoint is the controlplane IP defined above:
1VSPHERE_CONTROL_PLANE_ENDPOINT: 10.13.20.100
We can also see this IP being assigned to my one controlplane node in the vCenter view:
Now just have a quick look inside the TKG mgmt cluster and specifically after AKO and eventual ADC:
1tkg-bootstrap-vm:~/Kubernetes-library/examples/ingress$ k get pods -A
2NAMESPACE NAME READY STATUS RESTARTS AGE
3avi-system ako-0 1/1 Running 0 8h
4capi-kubeadm-bootstrap-system capi-kubeadm-bootstrap-controller-manager-5fb8fbc6c7-rqkzf 1/1 Running 0 8h
5capi-kubeadm-control-plane-system capi-kubeadm-control-plane-controller-manager-78c559f48c-cj2dm 1/1 Running 0 8h
6capi-system capi-controller-manager-84fbb669c-bhk4j 1/1 Running 0 8h
7capv-system capv-controller-manager-5f46567b86-pccf5 1/1 Running 0 8h
8cert-manager cert-manager-5d8d7b4dfb-gj6h2 1/1 Running 0 9h
9cert-manager cert-manager-cainjector-7797ff666f-zxh5l 1/1 Running 0 9h
10cert-manager cert-manager-webhook-59969cbb8c-vpsgr 1/1 Running 0 9h
11kube-system antrea-agent-6xzvh 2/2 Running 0 8h
12kube-system antrea-agent-gsfhc 2/2 Running 0 8h
13kube-system antrea-agent-t5gzb 2/2 Running 0 8h
14kube-system antrea-controller-74b468c659-hcrgp 1/1 Running 0 8h
15kube-system coredns-5d4666ccfb-qx5qt 1/1 Running 0 9h
16kube-system coredns-5d4666ccfb-xj47b 1/1 Running 0 9h
17kube-system etcd-tkg-stc-mgmt-cluster-sbptz-lkn58 1/1 Running 0 9h
18kube-system kube-apiserver-tkg-stc-mgmt-cluster-sbptz-lkn58 1/1 Running 0 9h
19kube-system kube-controller-manager-tkg-stc-mgmt-cluster-sbptz-lkn58 1/1 Running 0 9h
20kube-system kube-proxy-9d7b9 1/1 Running 0 9h
21kube-system kube-proxy-kd8h8 1/1 Running 0 9h
22kube-system kube-proxy-n7zwx 1/1 Running 0 9h
23kube-system kube-scheduler-tkg-stc-mgmt-cluster-sbptz-lkn58 1/1 Running 0 9h
24kube-system kube-vip-tkg-stc-mgmt-cluster-sbptz-lkn58 1/1 Running 0 9h
25kube-system metrics-server-b468f4d5f-hvtbg 1/1 Running 0 8h
26kube-system vsphere-cloud-controller-manager-fnsvh 1/1 Running 0 8h
27secretgen-controller secretgen-controller-697cb6c657-lh9rr 1/1 Running 0 8h
28tanzu-auth tanzu-auth-controller-manager-d75d85899-d8699 1/1 Running 0 8h
29tkg-system-networking ako-operator-controller-manager-5bbb9d4c4b-2bjsk 1/1 Running 0 8h
30tkg-system kapp-controller-9f9f578c7-dpzgk 2/2 Running 0 9h
31tkg-system object-propagation-controller-manager-5cbb94894f-k56w5 1/1 Running 0 8h
32tkg-system tanzu-addons-controller-manager-79f656b4c7-m72xw 1/1 Running 0 8h
33tkg-system tanzu-capabilities-controller-manager-5868c5f789-nbkgm 1/1 Running 0 8h
34tkg-system tanzu-featuregates-controller-manager-6d567fffd6-647s5 1/1 Running 0 8h
35tkg-system tkr-conversion-webhook-manager-6977bfc965-gjjbt 1/1 Running 0 8h
36tkg-system tkr-resolver-cluster-webhook-manager-5c8484ffd8-8xc8n 1/1 Running 0 8h
37tkg-system tkr-source-controller-manager-57c56d55d9-x6vsz 1/1 Running 0 8h
38tkg-system tkr-status-controller-manager-55b4b845b9-77snb 1/1 Running 0 8h
39tkg-system tkr-vsphere-resolver-webhook-manager-6476749d5d-5pxlk 1/1 Running 0 8h
40vmware-system-csi vsphere-csi-controller-585bf4dc75-wtlw2 7/7 Running 0 8h
41vmware-system-csi vsphere-csi-node-ldrs6 3/3 Running 2 (8h ago) 8h
42vmware-system-csi vsphere-csi-node-rwgpw 3/3 Running 4 (8h ago) 8h
43vmware-system-csi vsphere-csi-node-rx8f6 3/3 Running 4 (8h ago) 8h
There is an AKO pod running. Are there any ADCs created?
1tkg-bootstrap-vm:~/Kubernetes-library/examples/ingress$ k get adc
2NAME AGE
3install-ako-for-all 8h
4install-ako-for-management-cluster 8h
Lets have a look inside both of them, first out install-aka-for-all and then install-ako-for-management-cluster
1# Please edit the object below. Lines beginning with a '#' will be ignored,
2# and an empty file will abort the edit. If an error occurs while saving this file will be
3# reopened with the relevant failures.
4#
5apiVersion: networking.tkg.tanzu.vmware.com/v1alpha1
6kind: AKODeploymentConfig
7metadata:
8 annotations:
9 kapp.k14s.io/identity: v1;/networking.tkg.tanzu.vmware.com/AKODeploymentConfig/install-ako-for-all;networking.tkg.tanzu.vmware.com/v1alpha1
10 kapp.k14s.io/original: '{"apiVersion":"networking.tkg.tanzu.vmware.com/v1alpha1","kind":"AKODeploymentConfig","metadata":{"labels":{"kapp.k14s.io/app":"1685446688101132090","kapp.k14s.io/association":"v1.8329c3602ed02133e324fc22d58dcf28"},"name":"install-ako-for-all"},"spec":{"adminCredentialRef":{"name":"avi-controller-credentials","namespace":"tkg-system-networking"},"certificateAuthorityRef":{"name":"avi-controller-ca","namespace":"tkg-system-networking"},"cloudName":"stc-nsx-cloud","clusterSelector":{"matchLabels":{"adc-enabled":"true"}},"controlPlaneNetwork":{"cidr":"10.13.101.0/24","name":"vip-tkg-mgmt-l7"},"controller":"172.24.3.50","dataNetwork":{"cidr":"10.13.103.0/24","name":"vip-tkg-wld-l7"},"extraConfigs":{"disableStaticRouteSync":false,"ingress":{"defaultIngressController":false,"disableIngressClass":true,"nodeNetworkList":[{"networkName":"ls-tkg-mgmt"}]},"networksConfig":{"nsxtT1LR":"Tier-1"}},"serviceEngineGroup":"stc-nsx"}}'
11 kapp.k14s.io/original-diff-md5: c6e94dc94aed3401b5d0f26ed6c0bff3
12 creationTimestamp: "2023-05-30T11:38:45Z"
13 finalizers:
14 - ako-operator.networking.tkg.tanzu.vmware.com
15 generation: 2
16 labels:
17 kapp.k14s.io/app: "1685446688101132090"
18 kapp.k14s.io/association: v1.8329c3602ed02133e324fc22d58dcf28
19 name: install-ako-for-all
20 resourceVersion: "4686"
21 uid: 0cf0dd57-b193-40d5-bb03-347879157377
22spec:
23 adminCredentialRef:
24 name: avi-controller-credentials
25 namespace: tkg-system-networking
26 certificateAuthorityRef:
27 name: avi-controller-ca
28 namespace: tkg-system-networking
29 cloudName: stc-nsx-cloud
30 clusterSelector:
31 matchLabels:
32 adc-enabled: "true"
33 controlPlaneNetwork:
34 cidr: 10.13.101.0/24
35 name: vip-tkg-mgmt-l7
36 controller: 172.24.3.50
37 controllerVersion: 22.1.3
38 dataNetwork:
39 cidr: 10.13.103.0/24
40 name: vip-tkg-wld-l7
41 extraConfigs:
42 disableStaticRouteSync: false
43 ingress:
44 defaultIngressController: false
45 disableIngressClass: true
46 nodeNetworkList:
47 - networkName: ls-tkg-mgmt
48 networksConfig:
49 nsxtT1LR: Tier-1
50 serviceEngineGroup: stc-nsx
This is clearly configured for my datacenter 1, and will not match my datacenter 2 environment. Also notice the label, if I do create cluster and apply this label I will get the "default" ADC applied which will not match what I have to use in datacenter 2.
Lets have a look at the last one:
1# Please edit the object below. Lines beginning with a '#' will be ignored,
2# and an empty file will abort the edit. If an error occurs while saving this file will be
3# reopened with the relevant failures.
4#
5apiVersion: networking.tkg.tanzu.vmware.com/v1alpha1
6kind: AKODeploymentConfig
7metadata:
8 annotations:
9 kapp.k14s.io/identity: v1;/networking.tkg.tanzu.vmware.com/AKODeploymentConfig/install-ako-for-management-cluster;networking.tkg.tanzu.vmware.com/v1alpha1
10 kapp.k14s.io/original: '{"apiVersion":"networking.tkg.tanzu.vmware.com/v1alpha1","kind":"AKODeploymentConfig","metadata":{"labels":{"kapp.k14s.io/app":"1685446688101132090","kapp.k14s.io/association":"v1.3012c3c8e0fa37b13f4916c7baca1863"},"name":"install-ako-for-management-cluster"},"spec":{"adminCredentialRef":{"name":"avi-controller-credentials","namespace":"tkg-system-networking"},"certificateAuthorityRef":{"name":"avi-controller-ca","namespace":"tkg-system-networking"},"cloudName":"stc-nsx-cloud","clusterSelector":{"matchLabels":{"cluster-role.tkg.tanzu.vmware.com/management":""}},"controlPlaneNetwork":{"cidr":"10.13.101.0/24","name":"vip-tkg-mgmt-l7"},"controller":"172.24.3.50","dataNetwork":{"cidr":"10.13.101.0/24","name":"vip-tkg-mgmt-l7"},"extraConfigs":{"disableStaticRouteSync":false,"ingress":{"defaultIngressController":false,"disableIngressClass":true,"nodeNetworkList":[{"networkName":"ls-tkg-mgmt"}]},"networksConfig":{"nsxtT1LR":"Tier-1"}},"serviceEngineGroup":"tkgm-se-group"}}'
11 kapp.k14s.io/original-diff-md5: c6e94dc94aed3401b5d0f26ed6c0bff3
12 creationTimestamp: "2023-05-30T11:38:45Z"
13 finalizers:
14 - ako-operator.networking.tkg.tanzu.vmware.com
15 generation: 2
16 labels:
17 kapp.k14s.io/app: "1685446688101132090"
18 kapp.k14s.io/association: v1.3012c3c8e0fa37b13f4916c7baca1863
19 name: install-ako-for-management-cluster
20 resourceVersion: "4670"
21 uid: c41e6e39-2b0f-4fa4-9245-0eec1bcf6b5d
22spec:
23 adminCredentialRef:
24 name: avi-controller-credentials
25 namespace: tkg-system-networking
26 certificateAuthorityRef:
27 name: avi-controller-ca
28 namespace: tkg-system-networking
29 cloudName: stc-nsx-cloud
30 clusterSelector:
31 matchLabels:
32 cluster-role.tkg.tanzu.vmware.com/management: ""
33 controlPlaneNetwork:
34 cidr: 10.13.101.0/24
35 name: vip-tkg-mgmt-l7
36 controller: 172.24.3.50
37 controllerVersion: 22.1.3
38 dataNetwork:
39 cidr: 10.13.101.0/24
40 name: vip-tkg-mgmt-l7
41 extraConfigs:
42 disableStaticRouteSync: false
43 ingress:
44 defaultIngressController: false
45 disableIngressClass: true
46 nodeNetworkList:
47 - networkName: ls-tkg-mgmt
48 networksConfig:
49 nsxtT1LR: Tier-1
50 serviceEngineGroup: tkgm-se-group
The same is true for this one. Configured for my datacenter 1, only major difference being a different dataNetwork. So if I decide to deploy a workload cluster in the same datacenter it would be fine, but I dont want that I want my workload cluster to be in a different datacenter.. Lets do that..
TKG workload cluster deployment - with corresponding ADC
Before I deploy my workload cluster I will create a "custom" ADC specific for the datacenter 2 where I will deploy the workload cluster. Lets paste my ADC for the dc 2 workload cluster:
1apiVersion: networking.tkg.tanzu.vmware.com/v1alpha1
2kind: AKODeploymentConfig
3metadata:
4 name: ako-tkg-wdc-cloud
5spec:
6 adminCredentialRef:
7 name: avi-controller-credentials
8 namespace: tkg-system-networking
9 certificateAuthorityRef:
10 name: avi-controller-ca
11 namespace: tkg-system-networking
12 cloudName: wdc-nsx-cloud
13 clusterSelector:
14 matchLabels:
15 avi-cloud: "wdc-nsx-cloud"
16 controller: 172.24.3.50
17 dataNetwork:
18 cidr: 10.101.221.0/24
19 name: tkg-wld-1-apps
20 extraConfigs:
21 cniPlugin: antrea
22 disableStaticRouteSync: false # required
23 ingress:
24 defaultIngressController: true
25 disableIngressClass: false # required
26 nodeNetworkList: # required
27 - cidrs:
28 - 10.101.13.0/24
29 networkName: ls-tkg-wdc-wld-1
30 serviceType: NodePortLocal # required
31 shardVSSize: SMALL # required
32 l4Config:
33 autoFQDN: default
34 networksConfig:
35 nsxtT1LR: /infra/tier-1s/Tier-1
36 serviceEngineGroup: wdc-se-group
In this ADC I will configure the dataNetwork for the VIP network I have defined in the NSX ALB DC 2 cloud, pointing to the NSX-T Tier1 (yes they have the same name as in DC1, but they are not the same), nodeNetworkList matching where my workload cluster nodes will be placed in DC 2. And also notice the label, for my workload cluster to use this ADC I will need to apply this label either during provisioning or label it after creation. Apply the ADC:
1k apply -f ako-wld-cluster-1.wdc.cloud.yaml
Is it there:
1amarqvardsen@amarqvards1MD6T:~/Kubernetes-library/tkgm/stc-tkgm$ k get adc
2NAME AGE
3ako-tkg-wdc-cloud 20s
4install-ako-for-all 9h
5install-ako-for-management-cluster 9h
Yes it is.
Now prepare the TKG workload cluster manifest to match the DC 2 environment and apply it, also making sure aviAPIServerHAProvider is set to false.
1apiVersion: cpi.tanzu.vmware.com/v1alpha1
2kind: VSphereCPIConfig
3metadata:
4 name: wdc-tkgm-wld-cluster-1
5 namespace: ns-wdc-1
6spec:
7 vsphereCPI:
8 ipFamily: ipv4
9 mode: vsphereCPI
10 tlsCipherSuites: TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
11 vmNetwork:
12 excludeExternalSubnetCidr: 10.101.13.100/32
13 excludeInternalSubnetCidr: 10.101.13.100/32
14---
15apiVersion: csi.tanzu.vmware.com/v1alpha1
16kind: VSphereCSIConfig
17metadata:
18 name: wdc-tkgm-wld-cluster-1
19 namespace: ns-wdc-1
20spec:
21 vsphereCSI:
22 config:
23 datacenter: /cPod-NSXAM-WDC
24 httpProxy: ""
25 httpsProxy: ""
26 noProxy: ""
27 region: null
28 tlsThumbprint: vCenter SHA-1 of DC2
29 useTopologyCategories: false
30 zone: null
31 mode: vsphereCSI
32---
33apiVersion: run.tanzu.vmware.com/v1alpha3
34kind: ClusterBootstrap
35metadata:
36 annotations:
37 tkg.tanzu.vmware.com/add-missing-fields-from-tkr: v1.25.7---vmware.2-tkg.1
38 name: wdc-tkgm-wld-cluster-1
39 namespace: ns-wdc-1
40spec:
41 additionalPackages:
42 - refName: metrics-server*
43 - refName: secretgen-controller*
44 - refName: pinniped*
45 cpi:
46 refName: vsphere-cpi*
47 valuesFrom:
48 providerRef:
49 apiGroup: cpi.tanzu.vmware.com
50 kind: VSphereCPIConfig
51 name: wdc-tkgm-wld-cluster-1
52 csi:
53 refName: vsphere-csi*
54 valuesFrom:
55 providerRef:
56 apiGroup: csi.tanzu.vmware.com
57 kind: VSphereCSIConfig
58 name: wdc-tkgm-wld-cluster-1
59 kapp:
60 refName: kapp-controller*
61---
62apiVersion: v1
63kind: Secret
64metadata:
65 name: wdc-tkgm-wld-cluster-1
66 namespace: ns-wdc-1
67stringData:
68 password: password vCenter User
69 username: user@vcenter.net
70---
71apiVersion: cluster.x-k8s.io/v1beta1
72kind: Cluster
73metadata:
74 annotations:
75 osInfo: ubuntu,20.04,amd64
76 tkg.tanzu.vmware.com/cluster-controlplane-endpoint: 10.101.13.100 #here is the VIP for the workload k8s API - by Kube-VIP
77 tkg/plan: dev
78 labels:
79 tkg.tanzu.vmware.com/cluster-name: wdc-tkgm-wld-cluster-1
80 avi-cloud: "wdc-nsx-cloud"
81 name: wdc-tkgm-wld-cluster-1
82 namespace: ns-wdc-1
83spec:
84 clusterNetwork:
85 pods:
86 cidrBlocks:
87 - 20.10.0.0/16
88 services:
89 cidrBlocks:
90 - 20.20.0.0/16
91 topology:
92 class: tkg-vsphere-default-v1.0.0
93 controlPlane:
94 metadata:
95 annotations:
96 run.tanzu.vmware.com/resolve-os-image: image-type=ova,os-name=ubuntu
97 replicas: 1
98 variables:
99 - name: cni
100 value: antrea
101 - name: controlPlaneCertificateRotation
102 value:
103 activate: true
104 daysBefore: 90
105 - name: auditLogging
106 value:
107 enabled: false
108 - name: apiServerPort
109 value: 6443
110 - name: podSecurityStandard
111 value:
112 audit: baseline
113 deactivated: false
114 warn: baseline
115 - name: apiServerEndpoint
116 value: 10.101.13.100 #Here is the K8s API endpoint provided by Kube-VIP
117 - name: aviAPIServerHAProvider
118 value: false
119 - name: vcenter
120 value:
121 cloneMode: fullClone
122 datacenter: /cPod-NSXAM-WDC
123 datastore: /cPod-NSXAM-WDC/datastore/vsanDatastore-wdc-01
124 folder: /cPod-NSXAM-WDC/vm/TKGm
125 network: /cPod-NSXAM-WDC/network/ls-tkg-wdc-wld-1
126 resourcePool: /cPod-NSXAM-WDC/host/Cluster-1/Resources
127 server: vcsa.cpod-nsxam-wdc.az-wdc.cloud-garage.net
128 storagePolicyID: ""
129 template: /cPod-NSXAM-WDC/vm/ubuntu-2004-efi-kube-v1.25.7+vmware.2
130 tlsThumbprint: vCenter SHA1 DC2
131 - name: user
132 value:
133 sshAuthorizedKeys:
134 - ssh-rsa public key
135 - name: controlPlane
136 value:
137 machine:
138 diskGiB: 20
139 memoryMiB: 4096
140 numCPUs: 2
141 - name: worker
142 value:
143 count: 2
144 machine:
145 diskGiB: 20
146 memoryMiB: 4096
147 numCPUs: 2
148 version: v1.25.7+vmware.2-tkg.1
149 workers:
150 machineDeployments:
151 - class: tkg-worker
152 metadata:
153 annotations:
154 run.tanzu.vmware.com/resolve-os-image: image-type=ova,os-name=ubuntu
155 name: md-0
156 replicas: 2
Notice the label, it should match what you defined in the ADC.. avi-cloud: "wdc-nsx-cloud"
Now apply the cluster.
1tanzu cluster create -f wdc-tkg-wld-cluster-1.yaml
After a cup of coffee it should be ready.
Lets check the cluster from the mgmt cluster context:
1linux-vm:~/Kubernetes-library/tkgm/stc-tkgm$ k get cluster -n ns-wdc-1 wdc-tkgm-wld-cluster-1
2NAME PHASE AGE VERSION
3wdc-tkgm-wld-cluster-1 Provisioned 13m v1.25.7+vmware.2
Grab the k8s config and switch to the workload cluster context.
1tanzu cluster kubeconfig get wdc-tkgm-wld-cluster-1 --namespace ns-wdc-1 --admin --export-file wdc-tkgm-wld-cluster-1-k8s-config.yaml
Check the nodes:
1linux-vm:~/Kubernetes-library/tkgm/stc-tkgm$ k get nodes -o wide
2NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
3wdc-tkgm-wld-cluster-1-md-0-jkcjw-d7795fbb5-gxnpc Ready <none> 9h v1.25.7+vmware.2 10.101.13.26 10.101.13.26 Ubuntu 20.04.6 LTS 5.4.0-144-generic containerd://1.6.18-1-gdbc99e5b1
4wdc-tkgm-wld-cluster-1-md-0-jkcjw-d7795fbb5-ph5j4 Ready <none> 9h v1.25.7+vmware.2 10.101.13.42 10.101.13.42 Ubuntu 20.04.6 LTS 5.4.0-144-generic containerd://1.6.18-1-gdbc99e5b1
5wdc-tkgm-wld-cluster-1-s49w2-mz85r Ready control-plane 9h v1.25.7+vmware.2 10.101.13.41 10.101.13.41 Ubuntu 20.04.6 LTS 5.4.0-144-generic containerd://1.6.18-1-gdbc99e5b1
In my vCenter in DC 2?
Lets see them running pods:
1linux-vm:~/Kubernetes-library/tkgm/stc-tkgm$ k get pods -A
2NAMESPACE NAME READY STATUS RESTARTS AGE
3avi-system ako-0 1/1 Running 0 8h
4kube-system antrea-agent-rhmsn 2/2 Running 0 8h
5kube-system antrea-agent-ttk6f 2/2 Running 0 8h
6kube-system antrea-agent-tw6t7 2/2 Running 0 8h
7kube-system antrea-controller-787994578b-7v2cl 1/1 Running 0 8h
8kube-system coredns-5d4666ccfb-b2j85 1/1 Running 0 8h
9kube-system coredns-5d4666ccfb-lr97g 1/1 Running 0 8h
10kube-system etcd-wdc-tkgm-wld-cluster-1-s49w2-mz85r 1/1 Running 0 8h
11kube-system kube-apiserver-wdc-tkgm-wld-cluster-1-s49w2-mz85r 1/1 Running 0 8h
12kube-system kube-controller-manager-wdc-tkgm-wld-cluster-1-s49w2-mz85r 1/1 Running 0 8h
13kube-system kube-proxy-9m2zh 1/1 Running 0 8h
14kube-system kube-proxy-rntv8 1/1 Running 0 8h
15kube-system kube-proxy-t7z49 1/1 Running 0 8h
16kube-system kube-scheduler-wdc-tkgm-wld-cluster-1-s49w2-mz85r 1/1 Running 0 8h
17kube-system kube-vip-wdc-tkgm-wld-cluster-1-s49w2-mz85r 1/1 Running 0 8h
18kube-system metrics-server-c6d9969cb-7h5l7 1/1 Running 0 8h
19kube-system vsphere-cloud-controller-manager-b2rkl 1/1 Running 0 8h
20secretgen-controller secretgen-controller-cd678b84c-cdntv 1/1 Running 0 8h
21tkg-system kapp-controller-6c5dfccc45-7nhl5 2/2 Running 0 8h
22tkg-system tanzu-capabilities-controller-manager-5bf587dcd5-fp6t9 1/1 Running 0 8h
23vmware-system-csi vsphere-csi-controller-5459886d8c-5jzlz 7/7 Running 0 8h
24vmware-system-csi vsphere-csi-node-6pfbj 3/3 Running 4 (8h ago) 8h
25vmware-system-csi vsphere-csi-node-cbcpm 3/3 Running 4 (8h ago) 8h
26vmware-system-csi vsphere-csi-node-knk8q 3/3 Running 2 (8h ago) 8h
There is an AKO pod running. So far so good. How does the AKO configmap look like?
1# Please edit the object below. Lines beginning with a '#' will be ignored,
2# and an empty file will abort the edit. If an error occurs while saving this file will be
3# reopened with the relevant failures.
4#
5apiVersion: v1
6data:
7 apiServerPort: "8080"
8 autoFQDN: default
9 cloudName: wdc-nsx-cloud
10 clusterName: ns-wdc-1-wdc-tkgm-wld-cluster-1
11 cniPlugin: antrea
12 controllerIP: 172.24.3.50
13 controllerVersion: 22.1.3
14 defaultIngController: "true"
15 deleteConfig: "false"
16 disableStaticRouteSync: "false"
17 fullSyncFrequency: "1800"
18 logLevel: INFO
19 nodeNetworkList: '[{"networkName":"ls-tkg-wdc-wld-1","cidrs":["10.101.13.0/24"]}]'
20 nsxtT1LR: /infra/tier-1s/Tier-1
21 serviceEngineGroupName: wdc-se-group
22 serviceType: NodePortLocal
23 shardVSSize: SMALL
24 useDefaultSecretsOnly: "false"
25 vipNetworkList: '[{"networkName":"tkg-wld-1-apps","cidr":"10.101.221.0/24"}]'
26kind: ConfigMap
27metadata:
28 annotations:
29 kapp.k14s.io/identity: v1;avi-system//ConfigMap/avi-k8s-config;v1
30 kapp.k14s.io/original: '{"apiVersion":"v1","data":{"apiServerPort":"8080","autoFQDN":"default","cloudName":"wdc-nsx-cloud","clusterName":"ns-wdc-1-wdc-tkgm-wld-cluster-1","cniPlugin":"antrea","controllerIP":"172.24.3.50","controllerVersion":"22.1.3","defaultIngController":"true","deleteConfig":"false","disableStaticRouteSync":"false","fullSyncFrequency":"1800","logLevel":"INFO","nodeNetworkList":"[{\"networkName\":\"ls-tkg-wdc-wld-1\",\"cidrs\":[\"10.101.13.0/24\"]}]","nsxtT1LR":"/infra/tier-1s/Tier-1","serviceEngineGroupName":"wdc-se-group","serviceType":"NodePortLocal","shardVSSize":"SMALL","useDefaultSecretsOnly":"false","vipNetworkList":"[{\"networkName\":\"tkg-wld-1-apps\",\"cidr\":\"10.101.221.0/24\"}]"},"kind":"ConfigMap","metadata":{"labels":{"kapp.k14s.io/app":"1685448627039212099","kapp.k14s.io/association":"v1.ae838cced3b6caccc5a03bfb3ae65cd7"},"name":"avi-k8s-config","namespace":"avi-system"}}'
31 kapp.k14s.io/original-diff-md5: c6e94dc94aed3401b5d0f26ed6c0bff3
32 creationTimestamp: "2023-05-30T12:10:34Z"
33 labels:
34 kapp.k14s.io/app: "1685448627039212099"
35 kapp.k14s.io/association: v1.ae838cced3b6caccc5a03bfb3ae65cd7
36 name: avi-k8s-config
37 namespace: avi-system
38 resourceVersion: "2456"
39 uid: 81bbe809-f5a1-45b8-aef2-a83ff36a3dd1
That looks good, the question now will it blend?
So far I dont have anything in my NSX ALB dashboard. What happens then if I create some servicetype loadBalancer or Ingresses? Lets have a look. First check, do I have an IngressClass?
1linux-vm:~/Kubernetes-library/tkgm/stc-tkgm$ k get ingressclasses.networking.k8s.io
2NAME CONTROLLER PARAMETERS AGE
3avi-lb ako.vmware.com/avi-lb <none> 10m
I certainly do. I will now deploy an application with serviceType loadBalancer and an application using Ingress.
Here they are:
1linux-vm:~/Kubernetes-library/tkgm/stc-tkgm$ k get svc -n yelb
2NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
3redis-server ClusterIP 20.20.103.179 <none> 6379/TCP 10s
4yelb-appserver ClusterIP 20.20.242.204 <none> 4567/TCP 10s
5yelb-db ClusterIP 20.20.202.153 <none> 5432/TCP 10s
6yelb-ui LoadBalancer 20.20.123.18 10.101.221.10 80:30119/TCP 10s
1linux-vm:~/Kubernetes-library/tkgm/stc-tkgm$ k get ingress -n fruit
2NAME CLASS HOSTS ADDRESS PORTS AGE
3ingress-example avi-lb fruit-tkg.you-have.your-domain.here 10.101.221.11 80 50s
Looking at the Ingres I can also see NSX ALB has been so kind to register DNS record for it also.
How does it look inside my NSX ALB in DC1?
And where are the above NSX ALB Service Engines deployed?
In my vCenter in DC-2.
Well that was it. Thanks for reading.