GSLB With AKO & AMKO - NSX Advanced LoadBalancer
Overview
Global Server LoadBalancing in VMware Tanzu with AMKO
This post will go through how to configure AVI (NSX ALB) with GSLB in vSphere with Tanzu (TKGs) and an upstream k8s cluster in two different physical locations. I have already covered AKO in my previous posts, this post will assume knowledge of AKO (Avi Kubernetes Operator) and extend upon that with the use of AMKO (Avi Multi-Cluster Kubernetes Operator). The goal is to have the ability to scale my k8s applications between my "sites" and make them geo-redundant. For more information on AVI, AKO and AMKO head over here
Preparations and diagram over environment used in this post
This post will involve a upstream Ubuntu k8s cluster in my home-lab and a remote vSphere with Tanzu cluster. I have deployed one Avi Controller in my home lab and one Avi controller in the remote site. The k8s cluster in my home-lab is defined as the "primary" k8s cluster, the same goes for the Avi controller in my home-lab. There are some networking connectivity between the AVI controllers that needs to be in place such as 443 (API) between the controllers, and the AVI SE's needs to reach the GSLB VS vips on their respective side for GSLB health checks. Site A SE's dataplane needs connectivity to the vip that is created for the GSLB service on site B and vice versa. The primary k8s cluster also needs connectivity to the "secondary" k8s clusters endpoint ip/fqdn, k8s api (port 6443). AMKO needs this connectivity to listen for "GSLB" enabled services in the remote k8s clusters which triggers AMKO to automatically put them in your GSLB service. More on that later in the article. When all preparations are done the final diagram should look something like this:
(I will not cover what kind of infrastructure that connects the sites together as that is a completely different topic and can be as much). But there will most likely be a firewall involved between the sites, and the above mentioned connectivity needs to be adjusted in the firewall. In this post the following ip subnets will be used:
- SE Dataplane network home-lab: 10.150.1.0/24 (I only have two se's so there will be two addresses from this subnet) (I am running the all services on the same two SE's which is not recommended, one should atleast have dedicated SE's for the AVI DNS service)
- SE Dataplane network remote-site: 192.168.102.0/24 (Two SE's here also, in remote site I do have dedicated SE's for the AVI DNS Service but they will not be touched upon in this post only the SE's responsible for the GSLB services being created)
- VIP subnet for services exposed in home-lab k8s cluster: 10.150.12.0/24 (a dedicated vip subnet for all services exposed from this cluster)
- VIP subnet for services exposed in remote-site tkgs cluster: 192.168.151.0/24 (a dedicated vip subnet for all services exposed from this cluster)
For this network setup to work one needs to have routing in place, either with BGP enabled in AVI or static routes. Explanation: The SE's have their own dataplane network, they are also the ones responsible for creating the VIPs you define for your VS. So, if you want your VIPs to be reachable you have to make sure there are routes in your network to the VIPS where the SEs are next hops either with BGP or static routes. The VIP is what it is, a Virtual IP meaning it dont have its own VLAN and gateway in your infrastructure. It is created and realised by the SE's. The SE's are then the gateways for your VIPS. A VIP address could be anything. At the same time the SEs dataplane network needs connectivity to the backend servers it is supposed to loadbalance, so this dataplane network also needs routes to reach those. In this post that means the SE's dataplane network will need reachability to the k8s worker nodes where your apps are running in the home-lab site and in the remote site it needs reachability to the TKGs workers. On a sidenote I am not running routable pods, they are nat-ed trough my workers, and I am using Antrea as CNI with NodePortLocal configured. I also prefer to have a different network for the SE dataplane, different VIP subnets as it is easier to maintain control, isolation, firewall rules etc.
The diagram above is very high level, as it does not go into all networking details, firewall rules etc but it gives an overview of the communication needed.
When one have an clear idea of the connectivity requirements we need to form the GSLB "partnership" between the AVI controllers. I was thinking back and forth whether I should cover these steps also but instead I will link to a good friends blog site here that does this brilliantly. Its all about saving the environment of unnecessary digital ink 😄. This also goes for AKO deployment. This is also covered here or from the AVI docs page here
It should look like this on both controllers when everything is up and ready for GSLB:
It should be reflected on the secondary controller as well, except there will be no option to edit.
Time to deploy AMKO in K8s
AMKO can be deployed in two ways. It can be sufficient with only one instance of AMKO deployed in your primary k8s cluster, or you can go the federation approach and deploy AMKO in all your clusters that you want to use GSLB on. Then you will end up with one master instance of AMKO and "followers" or federation member on the others. One of the benefit is that you can promote one of the follower members if the primary is lost. I will go with the simple approach, deploy AMKO once, in my primary k8s cluster in my home-lab.
AMKO preparations before deploy with Helm
AMKO will be deployed by using Helm, so if Helm is not installed do that.
To successfully install AMKO there is a couple of things to be done. First, decide which is your primary cluster (where to deploy AMKO). When you have decided that (the easy step) then you need to prepare a secret that contains the context/clusters/users for all the k8s clusters you want to use GSLB on.
An example file can be found here. Create this content in a regular file and name the file gslb-members
. The naming of the file is important, if you name it differently AMKO will fail as it cant find the secret. I have tried to find a variable that is able override this in the value.yaml for the Helm chart but has not succeeded, so I went with the default naming. When that is populated with the k8s clusters you want, we need to create a secret in our primary k8s cluster like this: kubectl create secret generic gslb-config-secret --from-file gslb-members -n avi-system
. The namespace here is the namespace where AKO is already deployed in.
This should give you a secret like this:
1gslb-config-secret Opaque 1 20h
A note on kubeconfig for vSphere with Tanzu (TKGs)
When logging into a guest cluster in TKGs we usually do this through the supervisor with either vSphere local users or AD users defined in vSphere and we get a timebased token. Its not possible to use this approach. So what I went with was to grab the admin credentials for my TKGs guest cluster and used that context instead. Here is how to do that. This is not a recommended approach, instead one should create and use a service account. Maybe I will get back to this later and update how.
Back to the AMKO deployment...
The secret is ready, now we need to get the value.yaml for the AMKO version we will install. I am using AMKO 1.8.1 (same for AKO). The Helm repo for AMKO is already added if AKO has been installed using Helm, the same repo. If not, add the repo:
1helm repo add ako https://projects.registry.vmware.com/chartrepo/ako
Download the value.yaml:
1 helm show values ako/amko --version 1.8.1 > values.yaml (there is a typo in the official doc - it points to just amko)
Now edit the values.yaml:
1# This is a YAML-formatted file.
2# Declare variables to be passed into your templates.
3
4replicaCount: 1
5
6image:
7 repository: projects.registry.vmware.com/ako/amko
8 pullPolicy: IfNotPresent
9
10# Configs related to AMKO Federator
11federation:
12 # image repository
13 image:
14 repository: projects.registry.vmware.com/ako/amko-federator
15 pullPolicy: IfNotPresent
16 # cluster context where AMKO is going to be deployed
17 currentCluster: 'k8slab-admin@k8slab' #####use the context name - for your leader/primary cluster
18 # Set to true if AMKO on this cluster is the leader
19 currentClusterIsLeader: true
20 # member clusters to federate the GSLBConfig and GDP objects on, if the
21 # current cluster context is part of this list, the federator will ignore it
22 memberClusters:
23 - 'k8slab-admin@k8slab' #####use the context name
24 - 'tkgs-cluster-1-admin@tkgs-cluster-1' #####use the context name
25# Configs related to AMKO Service discovery
26serviceDiscovery:
27 # image repository
28 # image:
29 # repository: projects.registry.vmware.com/ako/amko-service-discovery
30 # pullPolicy: IfNotPresent
31
32# Configs related to Multi-cluster ingress. Note: MultiClusterIngress is a tech preview.
33multiClusterIngress:
34 enable: false
35
36configs:
37 gslbLeaderController: '172.18.5.51' ##### MGMT ip leader/primary avi controller
38 controllerVersion: 22.1.1
39 memberClusters:
40 - clusterContext: 'k8slab-admin@k8slab' #####use the context name
41 - clusterContext: 'tkgs-cluster-1-admin@tkgs-cluster-1' #####use the context name
42 refreshInterval: 1800
43 logLevel: INFO
44 # Set the below flag to true if a different GSLB Service fqdn is desired than the ingress/route's
45 # local fqdns. Note that, this field will use AKO's HostRule objects' to find out the local to global
46 # fqdn mapping. To configure a mapping between the local to global fqdn, configure the hostrule
47 # object as:
48 # [...]
49 # spec:
50 # virtualhost:
51 # fqdn: foo.avi.com
52 # gslb:
53 # fqdn: gs-foo.avi.com
54 useCustomGlobalFqdn: true ####### set this to true if you want to define custom FQDN for GSLB - I use this
55
56gslbLeaderCredentials:
57 username: 'admin' ##### username/password AVI Controller
58 password: 'password' ##### username/password AVI Controller
59
60globalDeploymentPolicy:
61 # appSelector takes the form of:
62 appSelector:
63 label:
64 app: 'gslb' #### I am using this selector for services to be used in GSLB
65 # Uncomment below and add the required ingress/route/service label
66 # appSelector:
67
68 # namespaceSelector takes the form of:
69 # namespaceSelector:
70 # label:
71 # ns: gslb <example label key-value for namespace>
72 # Uncomment below and add the reuqired namespace label
73 # namespaceSelector:
74
75 # list of all clusters that the GDP object will be applied to, can take any/all values
76 # from .configs.memberClusters
77 matchClusters:
78 - cluster: 'k8slab-admin@k8slab' ####use the context name
79 - cluster: 'tkgs-cluster-1-admin@tkgs-cluster-1' ####use the context name
80
81 # list of all clusters and their traffic weights, if unspecified, default weights will be
82 # given (optional). Uncomment below to add the required trafficSplit.
83 # trafficSplit:
84 # - cluster: "cluster1-admin"
85 # weight: 8
86 # - cluster: "cluster2-admin"
87 # weight: 2
88
89 # Uncomment below to specify a ttl value in seconds. By default, the value is inherited from
90 # Avi's DNS VS.
91 # ttl: 10
92
93 # Uncomment below to specify custom health monitor refs. By default, HTTP/HTTPS path based health
94 # monitors are applied on the GSs.
95 # healthMonitorRefs:
96 # - hmref1
97 # - hmref2
98
99 # Uncomment below to specify a Site Persistence profile ref. By default, Site Persistence is disabled.
100 # Also, note that, Site Persistence is only applicable on secure ingresses/routes and ignored
101 # for all other cases. Follow https://avinetworks.com/docs/20.1/gslb-site-cookie-persistence/ to create
102 # a Site persistence profile.
103 # sitePersistenceRef: gap-1
104
105 # Uncomment below to specify gslb service pool algorithm settings for all gslb services. Applicable
106 # values for lbAlgorithm:
107 # 1. GSLB_ALGORITHM_CONSISTENT_HASH (needs a hashMask field to be set too)
108 # 2. GSLB_ALGORITHM_GEO (needs geoFallback settings to be used for this field)
109 # 3. GSLB_ALGORITHM_ROUND_ROBIN (default)
110 # 4. GSLB_ALGORITHM_TOPOLOGY
111 #
112 # poolAlgorithmSettings:
113 # lbAlgorithm:
114 # hashMask: # required only for lbAlgorithm == GSLB_ALGORITHM_CONSISTENT_HASH
115 # geoFallback: # fallback settings required only for lbAlgorithm == GSLB_ALGORITHM_GEO
116 # lbAlgorithm: # can only have either GSLB_ALGORITHM_ROUND_ROBIN or GSLB_ALGORITHM_CONSISTENT_HASH
117 # hashMask: # required only for fallback lbAlgorithm as GSLB_ALGORITHM_CONSISTENT_HASH
118
119serviceAccount:
120 # Specifies whether a service account should be created
121 create: true
122 # Annotations to add to the service account
123 annotations: {}
124 # The name of the service account to use.
125 # If not set and create is true, a name is generated using the fullname template
126 name:
127
128resources:
129 limits:
130 cpu: 250m
131 memory: 300Mi
132 requests:
133 cpu: 100m
134 memory: 200Mi
135
136service:
137 type: ClusterIP
138 port: 80
139
140rbac:
141 # creates the pod security policy if set to true
142 pspEnable: false
143
144persistentVolumeClaim: ''
145mountPath: /log
146logFile: amko.log
147
148federatorLogFile: amko-federator.log
When done, its time to install AMKO like this:
1helm install ako/amko --generate-name --version 1.8.1 -f /path/to/values.yaml --set configs.gslbLeaderController=<leader_controller_ip> --namespace=avi-system ####There is a typo in the official docs - its pointing to amko only
If everything went well you should se a couple of things in your k8s cluster under the namespace avi-system
.
1k get pods -n avi-system
2NAME READY STATUS RESTARTS AGE
3ako-0 1/1 Running 0 25h
4amko-0 2/2 Running 0 20h
5
6k get amkocluster amkocluster-federation -n avi-system
7NAME AGE
8amkocluster-federation 20h
9
10k get gc -n avi-system gc-1
11NAME AGE
12gc-1 20h
13
14k get gdp -n avi-system
15NAME AGE
16global-gdp 20h
AMKO is up and running. Time create a GSLB service
Create GSLB service
You probably already have a bunch of ingress services running, and to make them GSLB "aware" there is not much to be done to achieve that. If you noticed in our value.yaml for the AMKO Helm chart we defined this:
1globalDeploymentPolicy:
2 # appSelector takes the form of:
3 appSelector:
4 label:
5 app: 'gslb' #### I am using this selector for services to be used in GSLB
So what we need to in our ingress service is to add the below, and then a new section where we define our gslb fqdn.
Here is my sample ingress applied in my primary k8s cluster:
1apiVersion: networking.k8s.io/v1
2kind: Ingress
3metadata:
4 name: ingress-example
5 labels: #### This is added for GSLB
6 app: gslb #### This is added for GSLB - Using the selector I chose in the value.yaml
7 namespace: fruit
8
9spec:
10 ingressClassName: avi-lb
11 rules:
12 - host: fruit-global.guzware.net #### Specific for this site (Home Lab)
13 http:
14 paths:
15 - path: /apple
16 pathType: Prefix
17 backend:
18 service:
19 name: apple-service
20 port:
21 number: 5678
22 - path: /banana
23 pathType: Prefix
24 backend:
25 service:
26 name: banana-service
27 port:
28 number: 5678
29--- #### New section to define a host rule
30apiVersion: ako.vmware.com/v1alpha1
31kind: HostRule
32metadata:
33 namespace: fruit
34 name: gslb-host-rule-fruit
35spec:
36 virtualhost:
37 fqdn: fruit-global.guzware.net #### Specific for this site (Home Lab)
38 enableVirtualHost: true
39 gslb:
40 fqdn: fruit.gslb.guzware.net ####This is common for both sites
As soon as it is applied, and there are no errors in AMKO or AKO, it should be visible in your AVI controller GUI:
If you click on the name it should take you to next page where it show the GSLB pool members and the status: Screenshot below is when both sites have applied their GSLB services: "
Next we need to apply gslb settings on the secondary site also:
This is what I have deployed on the secondary site (note the difference in domain names specific for that site)
1apiVersion: networking.k8s.io/v1
2kind: Ingress
3metadata:
4 name: ingress-example
5 labels: #### This is added for GSLB
6 app: gslb #### This is added for GSLB - Using the selector I chose in the value.yaml
7 namespace: fruit
8
9spec:
10 ingressClassName: avi-lb
11 rules:
12 - host: fruit-site-2.lab.guzware.net #### Specific for this site (Remote Site)
13 http:
14 paths:
15 - path: /apple
16 pathType: Prefix
17 backend:
18 service:
19 name: apple-service
20 port:
21 number: 5678
22 - path: /banana
23 pathType: Prefix
24 backend:
25 service:
26 name: banana-service
27 port:
28 number: 5678
29--- #### New section to define a host rule
30apiVersion: ako.vmware.com/v1alpha1
31kind: HostRule
32metadata:
33 namespace: fruit
34 name: gslb-host-rule-fruit
35spec:
36 virtualhost:
37 fqdn: fruit-site-2.lab.guzware.net #### Specific for this site (Remote Site)
38 enableVirtualHost: true
39 gslb:
40 fqdn: fruit.gslb.guzware.net ##### Common for both sites
When this is applied Avi will go ahead and put this into the same GSLB service as above, and the screenshot above will be true.
Now I have the same application deployed in both sites, but equally available whether I am sitting in my home-lab or at the remote-site. There is a bunch of parameters that can be tuned, which I will not go into now (maybe getting back to this and update with further possibilities with GSLB). But one of them can be LoadBalancing algorithms such as Geo Location Source. Say I want the application to be accessed from clients as close to the application as possible. And should one of the sites become unavailable it will still be accessible from one of the sites that are still online. Very cool indeed. For the sake of the demo I am about to show the only thing I change in the default GSLB settings is the TTL, I am setting it to 2 seconds so I can showcase that the application is being load balanced between both sites. Default algorithm is Round-Robin so it should balance between them regardless of the latency difference (accessing the application from my home network in my home lab vs from my home network in the remote-site which has several ms in distance). Heres where I am setting these settings:
With a TTL of 2 seconds it should switch faster so I can see the balancing between the two sites. Let me try to access the application from my browser using the gslb fqdn: fruit.gslb.guzware.net/apple
A refresh of the page and now:
To even illustrate more I will run a curl command against the gslb fqdn:
Now a ping against the FQDN to show the ip of the corresponding site that answer on the call:
Notice the change in ip address but also the latency in ms
Now I can go ahead and disable one of the site to simulate failover, and the application is still available on the same FQDN. So many possibilities with GSLB.
Thats it then. NSX ALB, AKO with AMKO between two sites, same application available in two physical location, redundancy, scale-out, availability. Stay tuned for more updates in advanced settings - in the future 😄