TKG Autoscaler

Overview

TKG autoscaler

From the official TKG documentation page:

Cluster Autoscaler is a Kubernetes program that automatically scales Kubernetes clusters depending on the demands on the workload clusters. Use Cluster Autoscaler only for workload clusters deployed by a standalone management cluster.

Ok, lets try out this then.

Enable Cluster Autoscaler

So one of the pre-requisites is a TKG standalone management cluster. I have that already deployed and running. Then for a workload cluster to be able to use the cluster autoscaler I need to enable this by adding some parameters in the cluster deployment manifest. The following is the autoscaler relevant variables, some variables are required some are optional and only valid for use on a workload cluster deployment manifest. According to the official documentation the only supported way to enable autoscaler is when provisioning a new workload cluster.

  • ENABLE_AUTOSCALER: "true" #Required if you want to enable the autoscaler

  • AUTOSCALER_MAX_NODES_TOTAL: "0" #Optional

  • AUTOSCALER_SCALE_DOWN_DELAY_AFTER_ADD: "10m" #Optional

  • AUTOSCALER_SCALE_DOWN_DELAY_AFTER_DELETE: "10s" #Optional

  • AUTOSCALER_SCALE_DOWN_DELAY_AFTER_FAILURE: "3m" #Optional

  • AUTOSCALER_SCALE_DOWN_UNNEEDED_TIME: "10m" #Optional

  • AUTOSCALER_MAX_NODE_PROVISION_TIME: "15m" #Optional

  • AUTOSCALER_MIN_SIZE_0: "1" #Required (if Autoscaler is enabled as above)

  • AUTOSCALER_MAX_SIZE_0: "2" #Required (if Autoscaler is enabled as above)

  • AUTOSCALER_MIN_SIZE_1: "1" #Required (if Autoscaler is enabled as above, and using prod template and tkg in multi-az )

  • AUTOSCALER_MAX_SIZE_1: "3" #Required (if Autoscaler is enabled as above, and using prod template and tkg in multi-az )

  • AUTOSCALER_MIN_SIZE_2: "1" #Required (if Autoscaler is enabled as above, and using prod template and tkg in multi-az )

  • AUTOSCALER_MAX_SIZE_2: "4" #Required (if Autoscaler is enabled as above, and using prod template and tkg in multi-az )

Enable Autoscaler upon provisioning of a new workload cluster

Start by preparing a class-based yaml for the workload cluster. This procedure involves adding the AUTOSCALER variables (above) to the tkg bootstrap yaml (the one used to deploy the TKG management cluster). Then generate a cluster-class yaml manifest for the new workload cluster. I will make a copy of my existing TKG bootstrap yaml file, name it something relevant to autoscaling. Then in this file I will add these variables:

 1#! ---------------
 2#! Workload Cluster Specific
 3#! -------------
 4ENABLE_AUTOSCALER: "true"
 5AUTOSCALER_MAX_NODES_TOTAL: "0"
 6AUTOSCALER_SCALE_DOWN_DELAY_AFTER_ADD: "10m"
 7AUTOSCALER_SCALE_DOWN_DELAY_AFTER_DELETE: "10s"
 8AUTOSCALER_SCALE_DOWN_DELAY_AFTER_FAILURE: "3m"
 9AUTOSCALER_SCALE_DOWN_UNNEEDED_TIME: "10m"
10AUTOSCALER_MAX_NODE_PROVISION_TIME: "15m"
11AUTOSCALER_MIN_SIZE_0: "1"  #This will be used if not using availability zones. If using az this will count as zone 1 - required
12AUTOSCALER_MAX_SIZE_0: "2"  ##This will be used if not using availability zones. If using az this will count as zone 1 - required
13AUTOSCALER_MIN_SIZE_1: "1"  #This will be used for availability zone 2
14AUTOSCALER_MAX_SIZE_1: "3"  #This will be used for availability zone 2
15AUTOSCALER_MIN_SIZE_2: "1"  #This will be used for availability zone 3
16AUTOSCALER_MAX_SIZE_2: "4"  #This will be used for availability zone 3
Tip!

If not using TKG in a multi availability zone deployment, there is no need to add the lines AUTOSCALER_MIN_SIZE_1, AUTOSCALER_MAX_SIZE_1, AUTOSCALER_MIN_SIZE_2, and AUTOSCALER_MAX_SIZE_2 as these are only used for the additional zones you have configured. For a "no AZ" deployment AUTOSCALER_MIN/MAX_SIZE_1 is sufficient.

After the above has been added I will do a "--dry-run" to create my workload cluster class-based yaml file:

1andreasm@tkg-bootstrap:~$ tanzu cluster create tkg-cluster-3-auto --namespace tkg-ns-3 --file tkg-mgmt-bootstrap-tkg-2.3-autoscaler.yaml --dry-run > tkg-cluster-3-auto.yaml

The above command gives the workload cluster the name tkg-cluster-3-auto in the namespace tkg-ns-3 and using the modified tkg bootstrap file containing the autocluster variables. The output is the class-based yaml I will use to create the cluster, like this (if no error during the dry-run command). In my mgmt bootstrap I have defined the autoscaler min_max settings just to reflect the capabilities in differentiating settings pr availability zone. According to the manual this should only be used in AWS, but in 2.3 multi-az is fully supported and the docs has probably not been updated yet. If I take a look at the class-based yaml:

 1    workers:
 2      machineDeployments:
 3      - class: tkg-worker
 4        failureDomain: wdc-zone-2
 5        metadata:
 6          annotations:
 7            cluster.x-k8s.io/cluster-api-autoscaler-node-group-max-size: "2"
 8            cluster.x-k8s.io/cluster-api-autoscaler-node-group-min-size: "1"
 9            run.tanzu.vmware.com/resolve-os-image: image-type=ova,os-name=ubuntu
10        name: md-0
11        strategy:
12          type: RollingUpdate
13      - class: tkg-worker
14        failureDomain: wdc-zone-3
15        metadata:
16          annotations:
17            cluster.x-k8s.io/cluster-api-autoscaler-node-group-max-size: "3"
18            cluster.x-k8s.io/cluster-api-autoscaler-node-group-min-size: "1"
19            run.tanzu.vmware.com/resolve-os-image: image-type=ova,os-name=ubuntu
20        name: md-1
21        strategy:
22          type: RollingUpdate
23      - class: tkg-worker
24        failureDomain: wdc-zone-3
25        metadata:
26          annotations:
27            cluster.x-k8s.io/cluster-api-autoscaler-node-group-max-size: "4"
28            cluster.x-k8s.io/cluster-api-autoscaler-node-group-min-size: "1"
29            run.tanzu.vmware.com/resolve-os-image: image-type=ova,os-name=ubuntu
30        name: md-2
31        strategy:
32          type: RollingUpdate
33---

I notice that it does take into consideration my different availability zones. Perfect.

Before I deploy my workload cluster, I will edit the manifest to only deploy worker nodes in my AZ zone 2 due to resource constraints in my lab and to make the demo a bit better (scaling up from one worker and back again) then I will deploy the workload cluster.

1andreasm@tkg-bootstrap:~$ tanzu cluster create --file tkg-cluster-3-auto.yaml
2Validating configuration...
3cluster class based input file detected, getting tkr version from input yaml
4input TKR Version: v1.26.5+vmware.2-tkg.1
5TKR Version v1.26.5+vmware.2-tkg.1, Kubernetes Version v1.26.5+vmware.2-tkg.1 configured

Now it is all about wating... After the wating period is done it is time for some testing...

Enable Autoscaler on existing/running workload cluster

I have already a TKG workload cluster up and running and I want to "post-enable" autoscaler in this cluster. This cluster has been deployed with the AUTOSCALER_ENABLE=false and below is the class based yaml manifest (no autoscaler variables):

 1    workers:
 2      machineDeployments:
 3      - class: tkg-worker
 4        failureDomain: wdc-zone-2
 5        metadata:
 6          annotations:
 7            run.tanzu.vmware.com/resolve-os-image: image-type=ova,os-name=ubuntu
 8        name: md-0
 9        replicas: 1
10        strategy:
11          type: RollingUpdate

The above class based yaml has been generated from my my mgmt bootstrap yaml with the AUTOSCALER settings like this:

 1#! ---------------
 2#! Workload Cluster Specific
 3#! -------------
 4ENABLE_AUTOSCALER: "false"
 5AUTOSCALER_MAX_NODES_TOTAL: "0"
 6AUTOSCALER_SCALE_DOWN_DELAY_AFTER_ADD: "10m"
 7AUTOSCALER_SCALE_DOWN_DELAY_AFTER_DELETE: "10s"
 8AUTOSCALER_SCALE_DOWN_DELAY_AFTER_FAILURE: "3m"
 9AUTOSCALER_SCALE_DOWN_UNNEEDED_TIME: "10m"
10AUTOSCALER_MAX_NODE_PROVISION_TIME: "15m"
11AUTOSCALER_MIN_SIZE_0: "1"
12AUTOSCALER_MAX_SIZE_0: "4"
13AUTOSCALER_MIN_SIZE_1: "1"
14AUTOSCALER_MAX_SIZE_1: "4"
15AUTOSCALER_MIN_SIZE_2: "1"
16AUTOSCALER_MAX_SIZE_2: "4"

If I check the autoscaler status:

1andreasm@linuxvm01:~$ k describe cm -n kube-system cluster-autoscaler-status
2Error from server (NotFound): configmaps "cluster-autoscaler-status" not found

Now, this cluster is in "serious" need to have autoscaler enabled. So how do I do that? This step is most likely not officially supported. I will now go back to the tkg mgmt bootstrap yaml, enable the autoscaler. Do a dry run of the config and apply the new class-based yaml manifest. This is all done in the TKG mgmt cluster context.

1andreasm@linuxvm01:~$ tanzu cluster create tkg-cluster-3-auto --namespace tkg-ns-3 --file tkg-mgmt-bootstrap-tkg-2.3-autoscaler-wld-1-zone.yaml --dry-run > tkg-cluster-3-auto-az.yaml

Before applying the yaml new class based manifest I will edit out the uneccessary crds, and just keep the updated settings relevant to the autoscaler, it may even be reduced further. Se my yaml below:

  1apiVersion: cluster.x-k8s.io/v1beta1
  2kind: Cluster
  3metadata:
  4  annotations:
  5    osInfo: ubuntu,20.04,amd64
  6    tkg/plan: dev
  7  labels:
  8    tkg.tanzu.vmware.com/cluster-name: tkg-cluster-3-auto
  9  name: tkg-cluster-3-auto
 10  namespace: tkg-ns-3
 11spec:
 12  clusterNetwork:
 13    pods:
 14      cidrBlocks:
 15      - 100.96.0.0/11
 16    services:
 17      cidrBlocks:
 18      - 100.64.0.0/13
 19  topology:
 20    class: tkg-vsphere-default-v1.1.0
 21    controlPlane:
 22      metadata:
 23        annotations:
 24          run.tanzu.vmware.com/resolve-os-image: image-type=ova,os-name=ubuntu
 25      replicas: 1
 26    variables:
 27    - name: cni
 28      value: antrea
 29    - name: controlPlaneCertificateRotation
 30      value:
 31        activate: true
 32        daysBefore: 90
 33    - name: auditLogging
 34      value:
 35        enabled: false
 36    - name: podSecurityStandard
 37      value:
 38        audit: restricted
 39        deactivated: false
 40        warn: restricted
 41    - name: apiServerEndpoint
 42      value: ""
 43    - name: aviAPIServerHAProvider
 44      value: true
 45    - name: vcenter
 46      value:
 47        cloneMode: fullClone
 48        datacenter: /cPod-NSXAM-WDC
 49        datastore: /cPod-NSXAM-WDC/datastore/vsanDatastore-wdc-01
 50        folder: /cPod-NSXAM-WDC/vm/TKGm
 51        network: /cPod-NSXAM-WDC/network/ls-tkg-mgmt
 52        resourcePool: /cPod-NSXAM-WDC/host/Cluster-1/Resources
 53        server: vcsa.FQDN
 54        storagePolicyID: ""
 55        tlsThumbprint: F8:----:7D
 56    - name: user
 57      value:
 58        sshAuthorizedKeys:
 59        - ssh-rsa BBAAB3NzaC1yc2EAAAADAQABA------QgPcxDoOhL6kdBHQY3ZRPE5LIh7RWM33SvsoIgic1OxK8LPaiGEPaOfUvP2ki7TNHLxP78bPxAfbkK7llDSmOIWrm7ukwG4DLHnyriBQahLqv1Wpx4kIRj5LM2UEBx235bVDSve==
 60    - name: controlPlane
 61      value:
 62        machine:
 63          diskGiB: 20
 64          memoryMiB: 4096
 65          numCPUs: 2
 66    - name: worker
 67      value:
 68        machine:
 69          diskGiB: 20
 70          memoryMiB: 4096
 71          numCPUs: 2
 72    - name: controlPlaneZoneMatchingLabels
 73      value:
 74        region: k8s-region
 75        tkg-cp: allowed
 76    - name: security
 77      value:
 78        fileIntegrityMonitoring:
 79          enabled: false
 80        imagePolicy:
 81          pullAlways: false
 82          webhook:
 83            enabled: false
 84            spec:
 85              allowTTL: 50
 86              defaultAllow: true
 87              denyTTL: 60
 88              retryBackoff: 500
 89        kubeletOptions:
 90          eventQPS: 50
 91          streamConnectionIdleTimeout: 4h0m0s
 92        systemCryptoPolicy: default
 93    version: v1.26.5+vmware.2-tkg.1
 94    workers:
 95      machineDeployments:
 96      - class: tkg-worker
 97        failureDomain: wdc-zone-2
 98        metadata:
 99          annotations:
100            cluster.x-k8s.io/cluster-api-autoscaler-node-group-max-size: "4"
101            cluster.x-k8s.io/cluster-api-autoscaler-node-group-min-size: "1"
102            run.tanzu.vmware.com/resolve-os-image: image-type=ova,os-name=ubuntu
103        name: md-0
104        strategy:
105          type: RollingUpdate
106---
107apiVersion: apps/v1
108kind: Deployment
109metadata:
110  labels:
111    app: tkg-cluster-3-auto-cluster-autoscaler
112  name: tkg-cluster-3-auto-cluster-autoscaler
113  namespace: tkg-ns-3
114spec:
115  replicas: 1
116  selector:
117    matchLabels:
118      app: tkg-cluster-3-auto-cluster-autoscaler
119  template:
120    metadata:
121      labels:
122        app: tkg-cluster-3-auto-cluster-autoscaler
123    spec:
124      containers:
125      - args:
126        - --cloud-provider=clusterapi
127        - --v=4
128        - --clusterapi-cloud-config-authoritative
129        - --kubeconfig=/mnt/tkg-cluster-3-auto-kubeconfig/value
130        - --node-group-auto-discovery=clusterapi:clusterName=tkg-cluster-3-auto,namespace=tkg-ns-3
131        - --scale-down-delay-after-add=10m
132        - --scale-down-delay-after-delete=10s
133        - --scale-down-delay-after-failure=3m
134        - --scale-down-unneeded-time=10m
135        - --max-node-provision-time=15m
136        - --max-nodes-total=0
137        command:
138        - /cluster-autoscaler
139        image: projects.registry.vmware.com/tkg/cluster-autoscaler:v1.26.2_vmware.1
140        name: tkg-cluster-3-auto-cluster-autoscaler
141        volumeMounts:
142        - mountPath: /mnt/tkg-cluster-3-auto-kubeconfig
143          name: tkg-cluster-3-auto-cluster-autoscaler-volume
144          readOnly: true
145      serviceAccountName: tkg-cluster-3-auto-autoscaler
146      terminationGracePeriodSeconds: 10
147      tolerations:
148      - effect: NoSchedule
149        key: node-role.kubernetes.io/master
150      - effect: NoSchedule
151        key: node-role.kubernetes.io/control-plane
152      volumes:
153      - name: tkg-cluster-3-auto-cluster-autoscaler-volume
154        secret:
155          secretName: tkg-cluster-3-auto-kubeconfig
156---
157apiVersion: rbac.authorization.k8s.io/v1
158kind: ClusterRoleBinding
159metadata:
160  creationTimestamp: null
161  name: tkg-cluster-3-auto-autoscaler-workload
162roleRef:
163  apiGroup: rbac.authorization.k8s.io
164  kind: ClusterRole
165  name: cluster-autoscaler-workload
166subjects:
167- kind: ServiceAccount
168  name: tkg-cluster-3-auto-autoscaler
169  namespace: tkg-ns-3
170---
171apiVersion: rbac.authorization.k8s.io/v1
172kind: ClusterRoleBinding
173metadata:
174  creationTimestamp: null
175  name: tkg-cluster-3-auto-autoscaler-management
176roleRef:
177  apiGroup: rbac.authorization.k8s.io
178  kind: ClusterRole
179  name: cluster-autoscaler-management
180subjects:
181- kind: ServiceAccount
182  name: tkg-cluster-3-auto-autoscaler
183  namespace: tkg-ns-3
184---
185apiVersion: v1
186kind: ServiceAccount
187metadata:
188  name: tkg-cluster-3-auto-autoscaler
189  namespace: tkg-ns-3
190---
191apiVersion: rbac.authorization.k8s.io/v1
192kind: ClusterRole
193metadata:
194  name: cluster-autoscaler-workload
195rules:
196- apiGroups:
197  - ""
198  resources:
199  - persistentvolumeclaims
200  - persistentvolumes
201  - pods
202  - replicationcontrollers
203  verbs:
204  - get
205  - list
206  - watch
207- apiGroups:
208  - ""
209  resources:
210  - nodes
211  verbs:
212  - get
213  - list
214  - update
215  - watch
216- apiGroups:
217  - ""
218  resources:
219  - pods/eviction
220  verbs:
221  - create
222- apiGroups:
223  - policy
224  resources:
225  - poddisruptionbudgets
226  verbs:
227  - list
228  - watch
229- apiGroups:
230  - storage.k8s.io
231  resources:
232  - csinodes
233  - storageclasses
234  verbs:
235  - get
236  - list
237  - watch
238- apiGroups:
239  - batch
240  resources:
241  - jobs
242  verbs:
243  - list
244  - watch
245- apiGroups:
246  - apps
247  resources:
248  - daemonsets
249  - replicasets
250  - statefulsets
251  verbs:
252  - list
253  - watch
254- apiGroups:
255  - ""
256  resources:
257  - events
258  verbs:
259  - create
260  - patch
261- apiGroups:
262  - ""
263  resources:
264  - configmaps
265  verbs:
266  - create
267  - delete
268  - get
269  - update
270- apiGroups:
271  - coordination.k8s.io
272  resources:
273  - leases
274  verbs:
275  - create
276  - get
277  - update
278---
279apiVersion: rbac.authorization.k8s.io/v1
280kind: ClusterRole
281metadata:
282  name: cluster-autoscaler-management
283rules:
284- apiGroups:
285  - cluster.x-k8s.io
286  resources:
287  - machinedeployments
288  - machines
289  - machinesets
290  verbs:
291  - get
292  - list
293  - update
294  - watch
295  - patch
296- apiGroups:
297  - cluster.x-k8s.io
298  resources:
299  - machinedeployments/scale
300  - machinesets/scale
301  verbs:
302  - get
303  - update
304- apiGroups:
305  - infrastructure.cluster.x-k8s.io
306  resources:
307  - '*'
308  verbs:
309  - get
310  - list

And now I will apply the above yaml on my running TKG workload cluster using kubectl (done from the mgmt context):

1andreasm@linuxvm01:~$ kubectl apply -f tkg-cluster-3-enable-only-auto-az.yaml
2cluster.cluster.x-k8s.io/tkg-cluster-3-auto configured
3Warning: would violate PodSecurity "restricted:v1.24": allowPrivilegeEscalation != false (container "tkg-cluster-3-auto-cluster-autoscaler" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "tkg-cluster-3-auto-cluster-autoscaler" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or container "tkg-cluster-3-auto-cluster-autoscaler" must set securityContext.runAsNonRoot=true), seccompProfile (pod or container "tkg-cluster-3-auto-cluster-autoscaler" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
4deployment.apps/tkg-cluster-3-auto-cluster-autoscaler created
5clusterrolebinding.rbac.authorization.k8s.io/tkg-cluster-3-auto-autoscaler-workload created
6clusterrolebinding.rbac.authorization.k8s.io/tkg-cluster-3-auto-autoscaler-management created
7serviceaccount/tkg-cluster-3-auto-autoscaler created
8clusterrole.rbac.authorization.k8s.io/cluster-autoscaler-workload unchanged
9clusterrole.rbac.authorization.k8s.io/cluster-autoscaler-management unchanged

Checking for autoscaler status now shows this:

 1andreasm@linuxvm01:~$ k describe cm -n kube-system cluster-autoscaler-status
 2Name:         cluster-autoscaler-status
 3Namespace:    kube-system
 4Labels:       <none>
 5Annotations:  cluster-autoscaler.kubernetes.io/last-updated: 2023-09-11 10:40:02.369535271 +0000 UTC
 6
 7Data
 8====
 9status:
10----
11Cluster-autoscaler status at 2023-09-11 10:40:02.369535271 +0000 UTC:
12Cluster-wide:
13  Health:      Healthy (ready=2 unready=0 (resourceUnready=0) notStarted=0 longNotStarted=0 registered=2 longUnregistered=0)
14               LastProbeTime:      2023-09-11 10:40:01.146686706 +0000 UTC m=+26.613355068
15               LastTransitionTime: 2023-09-11 10:40:01.146686706 +0000 UTC m=+26.613355068
16  ScaleUp:     NoActivity (ready=2 registered=2)
17               LastProbeTime:      2023-09-11 10:40:01.146686706 +0000 UTC m=+26.613355068
18               LastTransitionTime: 2023-09-11 10:40:01.146686706 +0000 UTC m=+26.613355068
19  ScaleDown:   NoCandidates (candidates=0)
20               LastProbeTime:      2023-09-11 10:40:01.146686706 +0000 UTC m=+26.613355068
21               LastTransitionTime: 2023-09-11 10:40:01.146686706 +0000 UTC m=+26.613355068
22
23NodeGroups:
24  Name:        MachineDeployment/tkg-ns-3/tkg-cluster-3-auto-md-0-s7d7t
25  Health:      Healthy (ready=1 unready=0 (resourceUnready=0) notStarted=0 longNotStarted=0 registered=1 longUnregistered=0 cloudProviderTarget=1 (minSize=1, maxSize=4))
26               LastProbeTime:      2023-09-11 10:40:01.146686706 +0000 UTC m=+26.613355068
27               LastTransitionTime: 2023-09-11 10:40:01.146686706 +0000 UTC m=+26.613355068
28  ScaleUp:     NoActivity (ready=1 cloudProviderTarget=1)
29               LastProbeTime:      2023-09-11 10:40:01.146686706 +0000 UTC m=+26.613355068
30               LastTransitionTime: 2023-09-11 10:40:01.146686706 +0000 UTC m=+26.613355068
31  ScaleDown:   NoCandidates (candidates=0)
32               LastProbeTime:      2023-09-11 10:40:01.146686706 +0000 UTC m=+26.613355068
33               LastTransitionTime: 2023-09-11 10:40:01.146686706 +0000 UTC m=+26.613355068
34
35
36
37BinaryData
38====
39
40Events:  <none>

Thats great.

Another way to do it is to edit the cluster directly following this KB article. This KB article can also be used to change/modify existing autoscaler settings.

Test the autoscaler

In the following chapters I will test the scale up and down of my worker nodes, based on load in the cluster. My initial cluster is up and running:

1NAME                                                   STATUS   ROLES           AGE     VERSION
2tkg-cluster-3-auto-md-0-fhrws-757648f59cxq4hlz-dcp2q   Ready    <none>          4m17s   v1.26.5+vmware.2
3tkg-cluster-3-auto-ns4jx-szp69                         Ready    control-plane   8m31s   v1.26.5+vmware.2

One control-plane node and one worker node. Now I want to check the status of the cluster-scaler:

 1andreasm@linuxvm01:~$ k describe cm -n kube-system cluster-autoscaler-status
 2Name:         cluster-autoscaler-status
 3Namespace:    kube-system
 4Labels:       <none>
 5Annotations:  cluster-autoscaler.kubernetes.io/last-updated: 2023-09-08 13:30:12.611110965 +0000 UTC
 6
 7Data
 8====
 9status:
10----
11Cluster-autoscaler status at 2023-09-08 13:30:12.611110965 +0000 UTC:
12Cluster-wide:
13  Health:      Healthy (ready=2 unready=0 (resourceUnready=0) notStarted=0 longNotStarted=0 registered=2 longUnregistered=0)
14               LastProbeTime:      2023-09-08 13:30:11.394021754 +0000 UTC m=+1356.335230920
15               LastTransitionTime: 2023-09-08 13:07:46.176049718 +0000 UTC m=+11.117258901
16  ScaleUp:     NoActivity (ready=2 registered=2)
17               LastProbeTime:      2023-09-08 13:30:11.394021754 +0000 UTC m=+1356.335230920
18               LastTransitionTime: 2023-09-08 13:07:46.176049718 +0000 UTC m=+11.117258901
19  ScaleDown:   NoCandidates (candidates=0)
20               LastProbeTime:      2023-09-08 13:30:11.394021754 +0000 UTC m=+1356.335230920
21               LastTransitionTime: 0001-01-01 00:00:00 +0000 UTC
22
23NodeGroups:
24  Name:        MachineDeployment/tkg-ns-3/tkg-cluster-3-auto-md-0-fhrws
25  Health:      Healthy (ready=1 unready=0 (resourceUnready=0) notStarted=0 longNotStarted=0 registered=1 longUnregistered=0 cloudProviderTarget=1 (minSize=1, maxSize=4))
26               LastProbeTime:      2023-09-08 13:30:11.394021754 +0000 UTC m=+1356.335230920
27               LastTransitionTime: 2023-09-08 13:12:44.585589045 +0000 UTC m=+309.526798282
28  ScaleUp:     NoActivity (ready=1 cloudProviderTarget=1)
29               LastProbeTime:      2023-09-08 13:30:11.394021754 +0000 UTC m=+1356.335230920
30               LastTransitionTime: 2023-09-08 13:12:44.585589045 +0000 UTC m=+309.526798282
31  ScaleDown:   NoCandidates (candidates=0)
32               LastProbeTime:      2023-09-08 13:30:11.394021754 +0000 UTC m=+1356.335230920
33               LastTransitionTime: 0001-01-01 00:00:00 +0000 UTC
34
35
36
37BinaryData
38====
39
40Events:  <none>

Scale-up - amount of worker nodes (horizontally)

Now I need to generate some load and see if it will do some magic scaling in the background.

I have deployed my Yelb app again, the only missing pod is the UI pod:

1NAME                              READY   STATUS    RESTARTS   AGE
2redis-server-56d97cc8c-4h54n      1/1     Running   0          6m56s
3yelb-appserver-65855b7ffd-j2bjt   1/1     Running   0          6m55s
4yelb-db-6f78dc6f8f-rg68q          1/1     Running   0          6m56s

I still have my one cp node and one worker node. I will now deploy the UI pod and scale an insane amount of UI pods for the Yelb application.

1yelb-ui-5c5b8d8887-9598s          1/1     Running   0          2m35s
1andreasm@linuxvm01:~$ k scale deployment -n yelb yelb-ui --replicas 200
2deployment.apps/yelb-ui scaled

Lets check some status after this... A bunch of pods in pending states, waiting for a node to be scheduled on.

 1NAME                              READY   STATUS    RESTARTS   AGE     IP             NODE                                                   NOMINATED NODE   READINESS GATES
 2redis-server-56d97cc8c-4h54n      1/1     Running   0          21m     100.96.1.9     tkg-cluster-3-auto-md-0-fhrws-757648f59cxq4hlz-dcp2q   <none>           <none>
 3yelb-appserver-65855b7ffd-j2bjt   1/1     Running   0          21m     100.96.1.11    tkg-cluster-3-auto-md-0-fhrws-757648f59cxq4hlz-dcp2q   <none>           <none>
 4yelb-db-6f78dc6f8f-rg68q          1/1     Running   0          21m     100.96.1.10    tkg-cluster-3-auto-md-0-fhrws-757648f59cxq4hlz-dcp2q   <none>           <none>
 5yelb-ui-5c5b8d8887-22v8p          1/1     Running   0          6m18s   100.96.1.53    tkg-cluster-3-auto-md-0-fhrws-757648f59cxq4hlz-dcp2q   <none>           <none>
 6yelb-ui-5c5b8d8887-2587j          0/1     Pending   0          3m49s   <none>         <none>                                                 <none>           <none>
 7yelb-ui-5c5b8d8887-2bzcg          0/1     Pending   0          3m51s   <none>         <none>                                                 <none>           <none>
 8yelb-ui-5c5b8d8887-2gncl          0/1     Pending   0          3m51s   <none>         <none>                                                 <none>           <none>
 9yelb-ui-5c5b8d8887-2gwp8          1/1     Running   0          3m53s   100.96.1.86    tkg-cluster-3-auto-md-0-fhrws-757648f59cxq4hlz-dcp2q   <none>           <none>
10yelb-ui-5c5b8d8887-2gz7r          0/1     Pending   0          3m50s   <none>         <none>                                                 <none>           <none>
11yelb-ui-5c5b8d8887-2jlvv          0/1     Pending   0          3m49s   <none>         <none>                                                 <none>           <none>
12yelb-ui-5c5b8d8887-2pfgp          1/1     Running   0          6m18s   100.96.1.36    tkg-cluster-3-auto-md-0-fhrws-757648f59cxq4hlz-dcp2q   <none>           <none>
13yelb-ui-5c5b8d8887-2prwf          0/1     Pending   0          3m50s   <none>         <none>                                                 <none>           <none>
14yelb-ui-5c5b8d8887-2vr4f          0/1     Pending   0          3m53s   <none>         <none>                                                 <none>           <none>
15yelb-ui-5c5b8d8887-2w2t8          0/1     Pending   0          3m49s   <none>         <none>                                                 <none>           <none>
16yelb-ui-5c5b8d8887-2x6b7          1/1     Running   0          6m18s   100.96.1.34    tkg-cluster-3-auto-md-0-fhrws-757648f59cxq4hlz-dcp2q   <none>           <none>
17yelb-ui-5c5b8d8887-2x726          1/1     Running   0          9m40s   100.96.1.23    tkg-cluster-3-auto-md-0-fhrws-757648f59cxq4hlz-dcp2q   <none>           <none>
18yelb-ui-5c5b8d8887-452bx          0/1     Pending   0          3m49s   <none>         <none>                                                 <none>           <none>
19yelb-ui-5c5b8d8887-452dd          1/1     Running   0          6m17s   100.96.1.69    tkg-cluster-3-auto-md-0-fhrws-757648f59cxq4hlz-dcp2q   <none>           <none>
20yelb-ui-5c5b8d8887-45nmz          0/1     Pending   0          3m48s   <none>         <none>                                                 <none>           <none>
21yelb-ui-5c5b8d8887-4kj69          1/1     Running   0          3m53s   100.96.1.109   tkg-cluster-3-auto-md-0-fhrws-757648f59cxq4hlz-dcp2q   <none>           <none>
22yelb-ui-5c5b8d8887-4svbf          0/1     Pending   0          3m50s   <none>         <none>                                                 <none>           <none>
23yelb-ui-5c5b8d8887-4t6dm          0/1     Pending   0          3m50s   <none>         <none>                                                 <none>           <none>
24yelb-ui-5c5b8d8887-4zlhw          0/1     Pending   0          3m51s   <none>         <none>                                                 <none>           <none>
25yelb-ui-5c5b8d8887-55qzm          1/1     Running   0          9m40s   100.96.1.15    tkg-cluster-3-auto-md-0-fhrws-757648f59cxq4hlz-dcp2q   <none>           <none>
26yelb-ui-5c5b8d8887-5fts4          1/1     Running   0          6m18s   100.96.1.55    tkg-cluster-3-auto-md-0-fhrws-757648f59cxq4hlz-dcp2q   <none>           <none>

The autoscaler status:

 1andreasm@linuxvm01:~$ k describe cm -n kube-system cluster-autoscaler-status
 2Name:         cluster-autoscaler-status
 3Namespace:    kube-system
 4Labels:       <none>
 5Annotations:  cluster-autoscaler.kubernetes.io/last-updated: 2023-09-08 14:01:43.794315378 +0000 UTC
 6
 7Data
 8====
 9status:
10----
11Cluster-autoscaler status at 2023-09-08 14:01:43.794315378 +0000 UTC:
12Cluster-wide:
13  Health:      Healthy (ready=2 unready=0 (resourceUnready=0) notStarted=0 longNotStarted=0 registered=2 longUnregistered=0)
14               LastProbeTime:      2023-09-08 14:01:41.380962042 +0000 UTC m=+3246.322171235
15               LastTransitionTime: 2023-09-08 13:07:46.176049718 +0000 UTC m=+11.117258901
16  ScaleUp:     InProgress (ready=2 registered=2)
17               LastProbeTime:      2023-09-08 14:01:41.380962042 +0000 UTC m=+3246.322171235
18               LastTransitionTime: 2023-09-08 14:01:41.380962042 +0000 UTC m=+3246.322171235
19  ScaleDown:   NoCandidates (candidates=0)
20               LastProbeTime:      2023-09-08 14:01:30.091765978 +0000 UTC m=+3235.032975159
21               LastTransitionTime: 0001-01-01 00:00:00 +0000 UTC
22
23NodeGroups:
24  Name:        MachineDeployment/tkg-ns-3/tkg-cluster-3-auto-md-0-fhrws
25  Health:      Healthy (ready=1 unready=0 (resourceUnready=0) notStarted=0 longNotStarted=0 registered=1 longUnregistered=0 cloudProviderTarget=2 (minSize=1, maxSize=4))
26               LastProbeTime:      2023-09-08 14:01:41.380962042 +0000 UTC m=+3246.322171235
27               LastTransitionTime: 2023-09-08 13:12:44.585589045 +0000 UTC m=+309.526798282
28  ScaleUp:     InProgress (ready=1 cloudProviderTarget=2)
29               LastProbeTime:      2023-09-08 14:01:41.380962042 +0000 UTC m=+3246.322171235
30               LastTransitionTime: 2023-09-08 14:01:41.380962042 +0000 UTC m=+3246.322171235
31  ScaleDown:   NoCandidates (candidates=0)
32               LastProbeTime:      2023-09-08 14:01:30.091765978 +0000 UTC m=+3235.032975159
33               LastTransitionTime: 0001-01-01 00:00:00 +0000 UTC
34
35
36
37BinaryData
38====
39
40Events:
41  Type    Reason         Age   From                Message
42  ----    ------         ----  ----                -------
43  Normal  ScaledUpGroup  12s   cluster-autoscaler  Scale-up: setting group MachineDeployment/tkg-ns-3/tkg-cluster-3-auto-md-0-fhrws size to 2 instead of 1 (max: 4)
44  Normal  ScaledUpGroup  11s   cluster-autoscaler  Scale-up: group MachineDeployment/tkg-ns-3/tkg-cluster-3-auto-md-0-fhrws size set to 2 instead of 1 (max: 4)

O yes, it has triggered a scale up. And in vCenter a new worker node is in the process:

1NAME                                                   STATUS     ROLES           AGE   VERSION
2tkg-cluster-3-auto-md-0-fhrws-757648f59cxq4hlz-dcp2q   Ready      <none>          55m   v1.26.5+vmware.2
3tkg-cluster-3-auto-md-0-fhrws-757648f59cxq4hlz-q6fqc   NotReady   <none>          10s   v1.26.5+vmware.2
4tkg-cluster-3-auto-ns4jx-szp69                         Ready      control-plane   59m   v1.26.5+vmware.2

Lets check the pods status when the new node has been provisioned and ready..

The node is now ready:

1NAME                                                   STATUS   ROLES           AGE    VERSION
2tkg-cluster-3-auto-md-0-fhrws-757648f59cxq4hlz-dcp2q   Ready    <none>          56m    v1.26.5+vmware.2
3tkg-cluster-3-auto-md-0-fhrws-757648f59cxq4hlz-q6fqc   Ready    <none>          101s   v1.26.5+vmware.2
4tkg-cluster-3-auto-ns4jx-szp69                         Ready    control-plane   60m    v1.26.5+vmware.2

All my 200 UI pods are now scheduled and running across two worker nodes:

 1NAME                              READY   STATUS    RESTARTS   AGE   IP             NODE                                                   NOMINATED NODE   READINESS GATES
 2redis-server-56d97cc8c-4h54n      1/1     Running   0          30m   100.96.1.9     tkg-cluster-3-auto-md-0-fhrws-757648f59cxq4hlz-dcp2q   <none>           <none>
 3yelb-appserver-65855b7ffd-j2bjt   1/1     Running   0          30m   100.96.1.11    tkg-cluster-3-auto-md-0-fhrws-757648f59cxq4hlz-dcp2q   <none>           <none>
 4yelb-db-6f78dc6f8f-rg68q          1/1     Running   0          30m   100.96.1.10    tkg-cluster-3-auto-md-0-fhrws-757648f59cxq4hlz-dcp2q   <none>           <none>
 5yelb-ui-5c5b8d8887-22v8p          1/1     Running   0          15m   100.96.1.53    tkg-cluster-3-auto-md-0-fhrws-757648f59cxq4hlz-dcp2q   <none>           <none>
 6yelb-ui-5c5b8d8887-2587j          1/1     Running   0          12m   100.96.2.82    tkg-cluster-3-auto-md-0-fhrws-757648f59cxq4hlz-q6fqc   <none>           <none>
 7yelb-ui-5c5b8d8887-2bzcg          1/1     Running   0          12m   100.96.2.9     tkg-cluster-3-auto-md-0-fhrws-757648f59cxq4hlz-q6fqc   <none>           <none>
 8yelb-ui-5c5b8d8887-2gncl          1/1     Running   0          12m   100.96.2.28    tkg-cluster-3-auto-md-0-fhrws-757648f59cxq4hlz-q6fqc   <none>           <none>
 9yelb-ui-5c5b8d8887-2gwp8          1/1     Running   0          12m   100.96.1.86    tkg-cluster-3-auto-md-0-fhrws-757648f59cxq4hlz-dcp2q   <none>           <none>
10yelb-ui-5c5b8d8887-2gz7r          1/1     Running   0          12m   100.96.2.38    tkg-cluster-3-auto-md-0-fhrws-757648f59cxq4hlz-q6fqc   <none>           <none>
11yelb-ui-5c5b8d8887-2jlvv          1/1     Running   0          12m   100.96.2.58    tkg-cluster-3-auto-md-0-fhrws-757648f59cxq4hlz-q6fqc   <none>           <none>
12yelb-ui-5c5b8d8887-2pfgp          1/1     Running   0          15m   100.96.1.36    tkg-cluster-3-auto-md-0-fhrws-757648f59cxq4hlz-dcp2q   <none>           <none>
13yelb-ui-5c5b8d8887-2prwf          1/1     Running   0          12m   100.96.2.48    tkg-cluster-3-auto-md-0-fhrws-757648f59cxq4hlz-q6fqc   <none>           <none>
14yelb-ui-5c5b8d8887-2vr4f          1/1     Running   0          12m   100.96.2.77    tkg-cluster-3-auto-md-0-fhrws-757648f59cxq4hlz-q6fqc   <none>           <none>
15yelb-ui-5c5b8d8887-2w2t8          1/1     Running   0          12m   100.96.2.63    tkg-cluster-3-auto-md-0-fhrws-757648f59cxq4hlz-q6fqc   <none>           <none>
16yelb-ui-5c5b8d8887-2x6b7          1/1     Running   0          15m   100.96.1.34    tkg-cluster-3-auto-md-0-fhrws-757648f59cxq4hlz-dcp2q   <none>           <none>
17yelb-ui-5c5b8d8887-2x726          1/1     Running   0          18m   100.96.1.23    tkg-cluster-3-auto-md-0-fhrws-757648f59cxq4hlz-dcp2q   <none>           <none>
18yelb-ui-5c5b8d8887-452bx          1/1     Running   0          12m   100.96.2.67    tkg-cluster-3-auto-md-0-fhrws-757648f59cxq4hlz-q6fqc   <none>           <none>
19yelb-ui-5c5b8d8887-452dd          1/1     Running   0          15m   100.96.1.69    tkg-cluster-3-auto-md-0-fhrws-757648f59cxq4hlz-dcp2q   <none>           <none>
20yelb-ui-5c5b8d8887-45nmz          1/1     Running   0          12m   100.96.2.100   tkg-cluster-3-auto-md-0-fhrws-757648f59cxq4hlz-q6fqc   <none>           <none>

Scale-down - remove un-needed worker nodes

Now that I have seen that the autoscaler is indeed scaling the amount worker nodes automatically, I will like to test whether it is also being capable of scaling down, removing unneccessary worker nodes as the load is not there any more. To test this I will just scale down the amount of UI pods in the Yelb application:

 1andreasm@linuxvm01:~$ k scale deployment -n yelb yelb-ui --replicas 2
 2deployment.apps/yelb-ui scaled
 3andreasm@linuxvm01:~$ k get pods -n yelb
 4NAME                              READY   STATUS        RESTARTS   AGE
 5redis-server-56d97cc8c-4h54n      1/1     Running       0          32m
 6yelb-appserver-65855b7ffd-j2bjt   1/1     Running       0          32m
 7yelb-db-6f78dc6f8f-rg68q          1/1     Running       0          32m
 8yelb-ui-5c5b8d8887-22v8p          1/1     Terminating   0          17m
 9yelb-ui-5c5b8d8887-2587j          1/1     Terminating   0          14m
10yelb-ui-5c5b8d8887-2bzcg          1/1     Terminating   0          14m
11yelb-ui-5c5b8d8887-2gncl          1/1     Terminating   0          14m
12yelb-ui-5c5b8d8887-2gwp8          1/1     Terminating   0          14m
13yelb-ui-5c5b8d8887-2gz7r          1/1     Terminating   0          14m
14yelb-ui-5c5b8d8887-2jlvv          1/1     Terminating   0          14m
15yelb-ui-5c5b8d8887-2pfgp          1/1     Terminating   0          17m
16yelb-ui-5c5b8d8887-2prwf          1/1     Terminating   0          14m
17yelb-ui-5c5b8d8887-2vr4f          1/1     Terminating   0          14m
18yelb-ui-5c5b8d8887-2w2t8          1/1     Terminating   0          14m

When all the unnecessary pods are gone, I need to monitor the removal of the worker nodes. It may take some minutes

The Yelb application is back to "normal"

1NAME                              READY   STATUS    RESTARTS   AGE
2redis-server-56d97cc8c-4h54n      1/1     Running   0          33m
3yelb-appserver-65855b7ffd-j2bjt   1/1     Running   0          33m
4yelb-db-6f78dc6f8f-rg68q          1/1     Running   0          33m
5yelb-ui-5c5b8d8887-dxlth          1/1     Running   0          21m
6yelb-ui-5c5b8d8887-gv829          1/1     Running   0          21m

Checking the autoscaler status now, it has identified a candidate to scale down. But as I have sat this AUTOSCALER_SCALE_DOWN_DELAY_AFTER_ADD: "10m" I will need to wait 10 minutes after LastTransitionTime ...

 1Name:         cluster-autoscaler-status
 2Namespace:    kube-system
 3Labels:       <none>
 4Annotations:  cluster-autoscaler.kubernetes.io/last-updated: 2023-09-08 14:19:46.985695728 +0000 UTC
 5
 6Data
 7====
 8status:
 9----
10Cluster-autoscaler status at 2023-09-08 14:19:46.985695728 +0000 UTC:
11Cluster-wide:
12  Health:      Healthy (ready=3 unready=0 (resourceUnready=0) notStarted=0 longNotStarted=0 registered=3 longUnregistered=0)
13               LastProbeTime:      2023-09-08 14:19:45.772876369 +0000 UTC m=+4330.714085660
14               LastTransitionTime: 2023-09-08 13:07:46.176049718 +0000 UTC m=+11.117258901
15  ScaleUp:     NoActivity (ready=3 registered=3)
16               LastProbeTime:      2023-09-08 14:19:45.772876369 +0000 UTC m=+4330.714085660
17               LastTransitionTime: 2023-09-08 14:08:21.539629262 +0000 UTC m=+3646.480838810
18  ScaleDown:   CandidatesPresent (candidates=1)
19               LastProbeTime:      2023-09-08 14:19:45.772876369 +0000 UTC m=+4330.714085660
20               LastTransitionTime: 2023-09-08 14:18:26.989571984 +0000 UTC m=+4251.930781291
21
22NodeGroups:
23  Name:        MachineDeployment/tkg-ns-3/tkg-cluster-3-auto-md-0-fhrws
24  Health:      Healthy (ready=2 unready=0 (resourceUnready=0) notStarted=0 longNotStarted=0 registered=2 longUnregistered=0 cloudProviderTarget=2 (minSize=1, maxSize=4))
25               LastProbeTime:      2023-09-08 14:19:45.772876369 +0000 UTC m=+4330.714085660
26               LastTransitionTime: 2023-09-08 13:12:44.585589045 +0000 UTC m=+309.526798282
27  ScaleUp:     NoActivity (ready=2 cloudProviderTarget=2)
28               LastProbeTime:      2023-09-08 14:19:45.772876369 +0000 UTC m=+4330.714085660
29               LastTransitionTime: 2023-09-08 14:08:21.539629262 +0000 UTC m=+3646.480838810
30  ScaleDown:   CandidatesPresent (candidates=1)
31               LastProbeTime:      2023-09-08 14:19:45.772876369 +0000 UTC m=+4330.714085660
32               LastTransitionTime: 2023-09-08 14:18:26.989571984 +0000 UTC m=+4251.930781291
33
34
35
36BinaryData
37====
38
39Events:
40  Type    Reason         Age   From                Message
41  ----    ------         ----  ----                -------
42  Normal  ScaledUpGroup  18m   cluster-autoscaler  Scale-up: setting group MachineDeployment/tkg-ns-3/tkg-cluster-3-auto-md-0-fhrws size to 2 instead of 1 (max: 4)
43  Normal  ScaledUpGroup  18m   cluster-autoscaler  Scale-up: group MachineDeployment/tkg-ns-3/tkg-cluster-3-auto-md-0-fhrws size set to 2 instead of 1 (max: 4)

After the 10 minutes:

1NAME                                                   STATUS   ROLES           AGE   VERSION
2tkg-cluster-3-auto-md-0-fhrws-757648f59cxq4hlz-dcp2q   Ready    <none>          77m   v1.26.5+vmware.2
3tkg-cluster-3-auto-ns4jx-szp69                         Ready    control-plane   81m   v1.26.5+vmware.2

Back to two nodes again, and the VM has been deleted from vCenter.

The autoscaler status:

 1Name:         cluster-autoscaler-status
 2Namespace:    kube-system
 3Labels:       <none>
 4Annotations:  cluster-autoscaler.kubernetes.io/last-updated: 2023-09-08 14:29:32.692769073 +0000 UTC
 5
 6Data
 7====
 8status:
 9----
10Cluster-autoscaler status at 2023-09-08 14:29:32.692769073 +0000 UTC:
11Cluster-wide:
12  Health:      Healthy (ready=2 unready=0 (resourceUnready=0) notStarted=0 longNotStarted=0 registered=2 longUnregistered=0)
13               LastProbeTime:      2023-09-08 14:29:31.482497258 +0000 UTC m=+4916.423706440
14               LastTransitionTime: 2023-09-08 13:07:46.176049718 +0000 UTC m=+11.117258901
15  ScaleUp:     NoActivity (ready=2 registered=2)
16               LastProbeTime:      2023-09-08 14:29:31.482497258 +0000 UTC m=+4916.423706440
17               LastTransitionTime: 2023-09-08 14:08:21.539629262 +0000 UTC m=+3646.480838810
18  ScaleDown:   NoCandidates (candidates=0)
19               LastProbeTime:      2023-09-08 14:29:31.482497258 +0000 UTC m=+4916.423706440
20               LastTransitionTime: 2023-09-08 14:28:46.471388976 +0000 UTC m=+4871.412598145
21
22NodeGroups:
23  Name:        MachineDeployment/tkg-ns-3/tkg-cluster-3-auto-md-0-fhrws
24  Health:      Healthy (ready=1 unready=0 (resourceUnready=0) notStarted=0 longNotStarted=0 registered=1 longUnregistered=0 cloudProviderTarget=1 (minSize=1, maxSize=4))
25               LastProbeTime:      2023-09-08 14:29:31.482497258 +0000 UTC m=+4916.423706440
26               LastTransitionTime: 2023-09-08 13:12:44.585589045 +0000 UTC m=+309.526798282
27  ScaleUp:     NoActivity (ready=1 cloudProviderTarget=1)
28               LastProbeTime:      2023-09-08 14:29:31.482497258 +0000 UTC m=+4916.423706440
29               LastTransitionTime: 2023-09-08 14:08:21.539629262 +0000 UTC m=+3646.480838810
30  ScaleDown:   NoCandidates (candidates=0)
31               LastProbeTime:      2023-09-08 14:29:31.482497258 +0000 UTC m=+4916.423706440
32               LastTransitionTime: 2023-09-08 14:28:46.471388976 +0000 UTC m=+4871.412598145
33
34
35
36BinaryData
37====
38
39Events:
40  Type    Reason          Age   From                Message
41  ----    ------          ----  ----                -------
42  Normal  ScaledUpGroup   27m   cluster-autoscaler  Scale-up: setting group MachineDeployment/tkg-ns-3/tkg-cluster-3-auto-md-0-fhrws size to 2 instead of 1 (max: 4)
43  Normal  ScaledUpGroup   27m   cluster-autoscaler  Scale-up: group MachineDeployment/tkg-ns-3/tkg-cluster-3-auto-md-0-fhrws size set to 2 instead of 1 (max: 4)
44  Normal  ScaleDownEmpty  61s   cluster-autoscaler  Scale-down: removing empty node "tkg-cluster-3-auto-md-0-fhrws-757648f59cxq4hlz-q6fqc"
45  Normal  ScaleDownEmpty  55s   cluster-autoscaler  Scale-down: empty node tkg-cluster-3-auto-md-0-fhrws-757648f59cxq4hlz-q6fqc removed

This works really well. Quite straight forward to enable and a really nice feature to have. And this also concludes this post.