vSphere with Tanzu 8 U2 using NSX AND NSX Advanced Loadbalancer

Overview

NSX and NSX Advanced Loadbalancer with TKG

Before vSphere 8 U2 it was not possible to use the NSX Advanced Loadbalancer loadbalancing the Kubernetes control plane nodes when configuring Tanzu or TKG in vSphere using NSX as the networking stack. I am talking about using NSX as an integrated part of the TKG installation (enabling Workload Management and selecting NSX as the network stack). It was of course possible to use NSX as a pure network "underlay" for TKG, and then use NSX ALB as both L4 and L7 (using AKO) provider. But then one misses out of the "magic" using NSX as the integrated network provider for TKG like automatic network creation, vrf/ip separation support, NSX policies, vSphere pods etc. One could expose services using servicetype loadBalancer (L4) from the workload clusters with NSX ALB in combination with NSX as the integrated networking stack, but one had to specify the loadBalancerClass to specify NSX-ALB for this, and not all services could be configured with this crd. The control plane nodes or the Kubernetes api endpoint was always exposed/managed by the built-in NSX-T loadbalancer. Now with the release of vSphere 8 U2 it seems like we finally can use NSX and NSX-ALB in combination where all loadbalancing needs are being managed by NSX-ALB, including the control plane nodes/kubernetes api endpoint.

So I am very excited to give this a try and see how it works out, the steps needed to be configured for this to work and how it look like during the installation and after. What happens in the NSX-ALB controller, what happens in the NSX environment.

Info

To enable this feature in this release is only for greenfield deployments.

What we should end up with in this release is using NSX for all networking and security features, NSX Advanced Loadbalancer for all loadbalancing needs.

nsx-alb

From the official release notes I have pasted below two features I will discuss in this post. The first one is the major topic of this post of course, but the second one is also a very nice feature that I actually have used a lot in this post. For the full list of new features head over to the official release notes for vSphere 8 U2 here and for all the Tanzu related features here

The major topic of this post:

  • Support of NSX Advanced Load Balancer for a Supervisor configured with NSX networking - You can now enable a Supervisor with NSX Advanced Load Balancer (Avi Networks) for L4 load balancing, as well as load balancing for the control plane nodes of Supervisor and Tanzu Kubernetes Grid clusters with NSX networking. Checkout the documentation page for guidance on configuring the NSX Advanced Load Balancer with NSX.

And the nice feature:

  • Import and export the Supervisor configuration - In previous versions, activating the Supervisor was a manual step-wise process without the ability to save any configurations. In the current release, you can now export and share the Supervisor configuration with peers in a human-readable format or within a source control system, import configurations to a new Supervisor, and replicate a standard configuration across multiple Supervisors. Checkout the documentation for details on how to export and import the Supervisor configuration.

Pre-requisites and assumptions

Before getting started with the installation/enabling TKG in vSphere 8 U2 using NSX-ALB in combination with NSX some requirments need to be met.

  • vSphere 8 U2 - kind of obvious but nice to mention
  • NSX version 4.1.1 or higher
  • NSX-ALB version 22.1.4 or higher (yes it is stated in the release notes of vSphere with Tanzu that version 22.1.3 is the supported release)
  • NSX-ALB Enterprise license

And the usual assumptions (mostly to save some time, as I have covered many of these topics several times before, saving digital ink is saving the environment 😄 I already have my vSphere 8 U2 environment running, I already have my NSX 4.1.1 environment configured and lastly I already have my NSX-ALB controller 22.1.4 deployed and working. My lab is used for many things so I am not deploying NSX and NSX-ALB from scratch just for this post, they are already running a bunch of other stuff including the NSX-LAB controller. In the next chapters I will go through what I had to prepare on the NSX side (if any) and the NSX-ALB side (there are a couple of steps there).

vSphere preparations - requirements

I only needed to upgrade my current vSphere 8 U1 environment to vSphere 8 U2, as always starting with the vCenter server using the VAMI interface, then updated the ESXi image in LCM to do a rolling upgrade of the ESXi hosts to 8 U2.

NSX-T preparations

For information how to install NSX, see my post here

NSX-ALB preparations - requirements

Instead of going through all the steps in configuring the NSX-ALB I will only post the settings that is specific/or needed for the NSX+NSX-ALB feature to work. As I already have a working NSX-ALB environment running and in use for other needs, its not that many changes I had to do. So I will show them here in their own section starting with the cloud. For reference how to install NSX-ALB I have done a post on that here.

NSX cloud

In my NSX-ALB I already have two clouds configured, both of them are NSX clouds. If you start from scratch all the necessary config needs to be done of course. See my posts here. In the NSX cloud I will be using for this I need to enable DHCP. That is done by editing your specific cloud and checking the box DHCP:

Info

If you happen to have multiple clouds configured it will figure out wich cloud it will use. The NSX manager knows which NSX cloud it must use, most likely because it will use the cloud that is the same NSX the API comes from. In my lab I have two NSX clouds configured, and it will select the correct cloud. My two NSX clouds are two unique NSX instances, if having two NSX clouds of same NSX instance I am not sure how it can select the right cloud.

IPAM profile

You need to make sure that you have configured an IPAM profile. Again, I already have that configured. The deployment will use this IPAM profile to configure the new usable VIP networks you define as the ingress cidr in TKG.

Important

In your IPAM profile it is very important to not have the Allocate ip in VRF option selected. This must be de-selected.

vrf-option

Then make sure your NSX cloud has this IPAM profile select:

Default Service-Engine Group

The Default-Service Engine Group in your NSX Cloud will be used as a "template" group. This means you should configure this default se-group how you want your SEs to be provisioned. From the official documentation:

The AKO creates one Service Engine Group for each vSphere with Tanzu cluster. The Service Engine Group configuration is derived from the Default-Group configuration. Once the Default-Group is configured with the required values, any new Service Engine Group created by the AKO will have the same settings. However, changes made to the Default-Group configuration will not reflect in an already created Service Engine Group. You must modify the configuration for an existing Service Engine Group separately.

So in my Default SE-Group I have this configuration:

default-group-1

default-group-settings

default-group-settings-cont

default-group-settings-cont

Under Scope I have confgured the vSphere cluster, and shared datastore placement.

default-group-setting-cont

save

When I am satisified the the settings that suits my need, save and I am done with the Default group configurations.

Custom Certificate

Another requirement is to change the default certificate to a custom one. One can follow the official documentation here. For now it is sufficient to just prepare the certificate, dont change the settings in the UI to use the new certificate yet, that is done after the step where you register the NSX-ALB endpoint to the NSX manager.

Summary of NSX-ALB pre-requisites

  • Need to have configured a NSX Cloud with the DHCP option enabled
  • Created a custom certificate (not configured to be used yet)
  • Created and configured an IPAM profile, updated the NSX cloud to use this profile, in the IPAM profile Allocate IP in VRF i de-selected.

NSX preparations - requirements - adding the NSX-ALB controller endpoint using API

I already have a working NSX 4.1.1 environment. The only thing I need to to here is to add the NSX-ALB controller (alb endpoint) so my NSX environment is aware of my NSX-ALB controller. The reason for this is that some configurations in NSX-ALB will be automatically done by NCP during TKG deployment using api. So the NSX manager will need to know the username and password of the NSX-ALB controller or controller-cluster.

Adding the NSX-ALB to the NSX manager

To add NSX-ALB to the NSX manager I will need to do an API call to the NSX manager using curl (or Postman or whatever preferred tool), the below is the call I will be issuing:

 1curl -k --location --request PUT 'https://172.24.3.10/policy/api/v1/infra/alb-onboarding-workflow' \ #IP NSX manager cluster IP
 2--header 'X-Allow-Overwrite: True' \
 3-u admin:password \  #password and username to the NSX manager in the format username:password
 4--header 'Content-Type: application/json' \
 5--data-raw '{
 6"owned_by": "LCM",
 7"cluster_ip": "172.24.3.50", # IP NSX-ALB controller or controller cluster IP
 8"infra_admin_username" : "admin", #username
 9"infra_admin_password" : "password", #password
10"dns_servers": ["172.24.3.1"], #not sure why I need to add this - my ALB is already configured with this
11"ntp_servers": ["172.24.3.1"] #not sure why I need to add this - my ALB is already configured with this
12}'

The offical documentation is using this example:

 1curl -k --location --request PUT 'https://<nsx-mgr-ip>/policy/api/v1/infra/alb-onboarding-workflow' \
 2--header 'X-Allow-Overwrite: True' \
 3--header 'Authorization: Basic <base64 encoding of username:password of NSX Mgr>' \
 4--header 'Content-Type: application/json' \
 5--data-raw '{
 6"owned_by": "LCM",
 7"cluster_ip": "<nsx-alb-controller-cluster-ip>",
 8"infra_admin_username" : "username",
 9"infra_admin_password" : "password",
10"dns_servers": ["<dns-servers-ips>"],
11"ntp_servers": ["<ntp-servers-ips>"]
12}'

I had issues authenticating using header, and I did not want to use time to troubleshoot why. I think one of the reasons is that I need to genereate some auth token etc...

Anyway, after a successful api call. You should get this output:

 1{
 2  "connection_info" : {
 3    "username" : "\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000",
 4    "tenant" : "admin",
 5    "expires_at" : "2023-09-22T18:22:22.627Z",
 6    "managed_by" : "LCM",
 7    "status" : "DEACTIVATE_PROVIDER",
 8    "certificate" : "-----BEGIN CERTIFICATE-----\nMIIDRTCCAi2gAwIBAgIUCcOSYBrBybt6zHfJKojPnX/fZ8xD8azd9mWp4oksVA1vHXzbWdsY\nw/Tdr3zTOEgEjn9mflE/aBhsahEhhaZfKtZtLO/OnvSZZtaMlHvlsHgfl8nOqhLh\nGBJzNNwIS8sjzi8E1/y3TI3kVshoCclL9A==\n-----END CERTIFICATE-----\n",
 9    "enforcement_point_address" : "172.24.3.50",
10    "resource_type" : "AviConnectionInfo"
11  },
12  "auto_enforce" : true,
13  "resource_type" : "EnforcementPoint",
14  "id" : "alb-endpoint",
15  "display_name" : "alb-endpoint",
16  "path" : "/infra/sites/default/enforcement-points/alb-endpoint",
17  "relative_path" : "alb-endpoint",
18  "parent_path" : "/infra/sites/default",
19  "remote_path" : "",
20  "unique_id" : "8ef2126e-9311-40ff-bd6d-08c51017326c",
21  "realization_id" : "8ef2126e-9311-40ff-bd6d-08c51017326c",
22  "owner_id" : "4b04712e-498d-42d0-ad90-7ab06c398c60",
23  "marked_for_delete" : false,
24  "overridden" : false,
25  "_create_time" : 1695385329242,
26  "_create_user" : "admin",
27  "_last_modified_time" : 1695385329242,
28  "_last_modified_user" : "admin",
29  "_system_owned" : false,
30  "_protection" : "NOT_PROTECTED",
31  "_revision" : 0

You can also do the following API call to verify if the NSX-ALB controller has been added:

 1andreasm@ubuntu02:~$ curl -s -k -u admin:password https://172.24.3.10/policy/api/v1/infra/sites/default/enforcement-points/alb-endpoint
 2{
 3  "connection_info" : {
 4    "username" : "\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000",
 5    "tenant" : "admin",
 6    "expires_at" : "2023-09-24T02:00:05.362Z",
 7    "managed_by" : "LCM",
 8    "status" : "DEACTIVATE_PROVIDER",
 9    "certificate" : "-----BEGIN CERTIFICATE-----\nMIIGy2k9O+hp6fX+iG5BGDurG8hP8A\nKI96AUNxV39pXOBIqBr/sL3v/24DVz85ObAvIzoWnTak9ZhZyP4jUfZD/w21xdXz\nKaTJ5ioC+M6RLRKVVJ159lrm3A==\n-----END CERTIFICATE-----\n",
10    "enforcement_point_address" : "172.24.3.50",
11    "resource_type" : "AviConnectionInfo"
12  },
13  "auto_enforce" : true,
14  "resource_type" : "EnforcementPoint",
15  "id" : "alb-endpoint",
16  "display_name" : "alb-endpoint",
17  "path" : "/infra/sites/default/enforcement-points/alb-endpoint",
18  "relative_path" : "alb-endpoint",
19  "parent_path" : "/infra/sites/default",
20  "remote_path" : "",
21  "unique_id" : "8ef2126e-9311-40ff-bd6d-08c51017326c",
22  "realization_id" : "8ef2126e-9311-40ff-bd6d-08c51017326c",
23  "owner_id" : "4b04712e-498d-42d0-ad90-7ab06c398c60",
24  "marked_for_delete" : false,
25  "overridden" : false,
26  "_create_time" : 1695385329242,
27  "_create_user" : "admin",
28  "_last_modified_time" : 1695499205391,
29  "_last_modified_user" : "system",
30  "_system_owned" : false,
31  "_protection" : "NOT_PROTECTED",
32  "_revision" : 7

The "certificate" you get back should be the current certificate on the NSX-ALB controller. More on that later.

Thats it on the NSX manager side.

Info

After you a have done the above operation and you log back into the NSX-ALB controller this welcome wizard pops up. Just click cancel on it.

I figured out a way to disable this initial setup-wizard instead fo going through it via the UI. SSH into the NSX-ALB controller, enter into shell and enter the following commands:

 1### Show the current systemconfiguration
 2[admin:172-24-3-50]: > show systemconfiguration
 3+----------------------------------+------------------------------------+
 4| Field                            | Value                              |
 5+----------------------------------+------------------------------------+
 6| uuid                             | default                            |
 7| dns_configuration                |                                    |
 8|   server_list[1]                 | 172.24.3.1                         |
 9| ntp_configuration                |                                    |
10|   ntp_servers[1]                 |                                    |
11|     server                       | 172.24.3.1                         |
12| portal_configuration             |                                    |
13|   enable_https                   | True                               |
14|   redirect_to_https              | True                               |
15|   enable_http                    | True                               |
16|   sslkeyandcertificate_refs[1]   | tkgm-cert-controller               |
17|   use_uuid_from_input            | False                              |
18|   sslprofile_ref                 | System-Standard-Portal             |
19|   enable_clickjacking_protection | True                               |
20|   allow_basic_authentication     | True                               |
21|   password_strength_check        | False                              |
22|   disable_remote_cli_shell       | False                              |
23|   disable_swagger                | False                              |
24|   api_force_timeout              | 24 hours                           |
25|   minimum_password_length        | 8                                  |
26| global_tenant_config             |                                    |
27|   tenant_vrf                     | False                              |
28|   se_in_provider_context         | True                               |
29|   tenant_access_to_provider_se   | True                               |
30| email_configuration              |                                    |
31|   smtp_type                      | SMTP_LOCAL_HOST                    |
32|   from_email                     | admin@avicontroller.net            |
33|   mail_server_name               | localhost                          |
34|   mail_server_port               | 25                                 |
35|   disable_tls                    | False                              |
36| docker_mode                      | False                              |
37| ssh_ciphers[1]                   | aes128-ctr                         |
38| ssh_ciphers[2]                   | aes256-ctr                         |
39| ssh_hmacs[1]                     | hmac-sha2-512-etm@openssh.com      |
40| ssh_hmacs[2]                     | hmac-sha2-256-etm@openssh.com      |
41| ssh_hmacs[3]                     | hmac-sha2-512                      |
42| default_license_tier             | ENTERPRISE                         |
43| secure_channel_configuration     |                                    |
44|   sslkeyandcertificate_refs[1]   | System-Default-Secure-Channel-Cert |
45| welcome_workflow_complete        | False                               |
46| fips_mode                        | False                              |
47| enable_cors                      | False                              |
48| common_criteria_mode             | False                              |
49| host_key_algorithm_exclude       |                                    |
50| kex_algorithm_exclude            |                                    |
51+----------------------------------+------------------------------------+

Notice the welcome_workflow_complete is set to False. Set this to true using the following command:

1[admin:172-24-3-50]: > configure systemconfiguration welcome_workflow_complete 1

Or

1[admin:172-24-3-50]: > configure systemconfiguration
2[admin:172-24-3-50-configure]: > welcome_workflow_complete (hit henter)

If you show the systemconfiguration again now it should be changed to True... Logout and back in to the NSX-ALB gui and you should not be asked to do the initial setup wizard.

Custom certificate in the NSX-ALB controller

After adding the NSX-ALB endpoint, configure NSX-ALB to use the newly created custom certificate created earlier. Head over to Administration -> System Settings:

Click edit and adjust the settings accordingly, and make sure to select the custom certificate.

Check the option Allow Basic Authentication and update the field SSL/TLS Certificate with the custom certificate. Click save.

Now the pre-requisities has been done, and its time to get started with the WCP installation.

My NSX and NSX-ALB environment before WCP installation

Before I head over and do the actual WCP installation, I will first just take a couple of screenshots from my NSX environment as a before and after.

NSX environment before WCP installation

I have three Tier-0s configured, whereas I will only use the stc-tier-0 (the one with most objects connected to it) in this post.

topology

tier-0s

tier-1

segments

loadbalancers

dhcp

NSX-ALB environment before WCP installation

One Virtual Service (my DNS service).

IPAM and DNS profile (IPAM only needed for WCP)

Content of my current IPAM profile

My clouds configured

Current networks configured in my stc-nsx-cloud

Current running Service Engines

My Service Engine Groups

service-engines

My configured Network Profiles

network-profiles

Then my VRF contexts, only configured for the other services I have been running.

Now that we have all the before screenshots, lets do the vSphere with Tanzu installation and see how it goes.

vSphere with Tanzu installation

There is nothing different in the UI when it comes to this specific feature, I select NSX as the networkl and populate the fields as normal. But I will list the steps here any way.

Info

I will use the new Import and Export feature as I already have done this installation several times. The first time I filled everything in manually then at the end of the wizard I clicked Export configuration and saved my config. So now I just have to import the config and go through all the fiels using next, and change something if needed, then finish.

Here is how to choose the export, select it and a download dialog will pop up. To use it later for import extract the content and select the file called wcp-config.json

image-20230924185550151

Get Started

Import Config

Import, and now all fields are populated.

The wcp-config.json file content:

1{"specVersion":"1.0","supervisorSpec":{"supervisorName":"stc-svc"},"envSpec":{"vcenterDetails":{"vcenterAddress":"vcsa.cpod-nsxam-stc.az-stc.cloud-garage.net","vcenterCluster":"Cluster"}},"tkgsComponentSpec":{"tkgsStoragePolicySpec":{"masterStoragePolicy":"vSAN Default Storage Policy","imageStoragePolicy":"vSAN Default Storage Policy","ephemeralStoragePolicy":"vSAN Default Storage Policy"},"tkgsMgmtNetworkSpec":{"tkgsMgmtNetworkName":"ls-mgmt","tkgsMgmtIpAssignmentMode":"STATICRANGE","tkgsMgmtNetworkStartingIp":"10.13.10.20","tkgsMgmtNetworkGatewayCidr":"10.13.10.1/24","tkgsMgmtNetworkDnsServers":["172.24.3.1"],"tkgsMgmtNetworkSearchDomains":["cpod-nsxam-stc.az-stc.cloud-garage.net"],"tkgsMgmtNetworkNtpServers":["172.24.3.1"]},"tkgsNcpClusterNetworkInfo":{"tkgsClusterDistributedSwitch":"VDSwitch","tkgsNsxEdgeCluster":"SomeEdgeCluster","tkgsNsxTier0Gateway":"stc-tier-0","tkgsNamespaceSubnetPrefix":28,"tkgsRoutedMode":true,"tkgsNamespaceNetworkCidrs":["10.13.80.0/23"],"tkgsIngressCidrs":["10.13.90.0/24"],"tkgsEgressCidrs":[],"tkgsWorkloadDnsServers":["172.24.3.1"],"tkgsWorkloadServiceCidr":"10.96.0.0/23"},"apiServerDnsNames":[],"controlPlaneSize":"SMALL"}}

I am deselecting NAT as I dont need that.

Now, its just clicking finish and monitor the progress.

One will soon see two new Virtual Services being created, two new service engines (according to my default-service-engine group config) being deployed.

When everything is ready and available the virtual services will become yellow/green (fully green after a while as it depends on how long the VS is down before it becomes available/up):

And in vCenter the Workload Management progress:

wcp-status

Can I reach it?

Yes!

This is really really nice.

Now next chapters will go through what has been done in my NSX environment and the NSX-ALB environment.

My NSX and NSX-ALB environment after WCP installation

Now that everything is green and joy. Let us have a check inside and see what has been configured.

NSX environment after WCP installation

Topology view:

Topology

A new Tier-1 I can see, with two new segments, with some VMs in each.

Below I can see the new Tier-1

tier-1

In the new Tier-1 router it has also automatically added the static routes for the VIP to the SEs:

These static routes needs to be either advertised by the Tier-0 using either BGP or OSPF, or manually created static routes in the network infrastructure. I am using BGP.

The two new segments:

Avi-domain-c8:507... This segment is where my NSX-ALB Service Engines Dataplane network is located.

avi-doamain

and the second segment vm-domain-c8:507.. is where the default workload network for my Supervisor Control Plane nodes is placed.

vm-domain

Now, what is under Loadbalancing:

dlb

It has created a distributed loadbalancer where all virtual servers are kubernetes services using the services CIDR:

dlb-vs

The NSX Distributed Loadbalancer is used for the ClusterIP services running inside the Supervisor, the ones that are running on the ESXi hosts. The LoadBalancer services is handled by NSX-ALB.

 1andreasm@ubuntu02:~/avi_nsxt_wcp$ k get svc -A
 2NAMESPACE                                   NAME                                                             TYPE           CLUSTER-IP    EXTERNAL-IP   PORT(S)                         AGE
 3default                                     kubernetes                                                       ClusterIP      10.96.0.1     <none>        443/TCP                         15h
 4kube-system                                 docker-registry                                                  ClusterIP      10.96.0.37    <none>        5000/TCP                        15h
 5kube-system                                 kube-apiserver-authproxy-svc                                     ClusterIP      10.96.0.243   <none>        8443/TCP                        14h
 6kube-system                                 kube-apiserver-lb-svc                                            LoadBalancer   10.96.0.201   10.13.90.1    443:32163/TCP,6443:31957/TCP    15h
 7kube-system                                 kube-dns                                                         ClusterIP      10.96.0.10    <none>        53/UDP,53/TCP,9153/TCP          15h
 8kube-system                                 snapshot-validation-service                                      ClusterIP      10.96.0.128   <none>        443/TCP                         15h
 9ns-stc-1                                    cluster-1-76996f5b17a254b02e55a                                  LoadBalancer   10.96.0.239   10.13.92.2    80:30165/TCP                    12h
10ns-stc-1                                    cluster-1-control-plane-service                                  LoadBalancer   10.96.1.157   10.13.92.1    6443:30363/TCP                  13h
11vmware-system-appplatform-operator-system   packaging-api                                                    ClusterIP      10.96.0.87    <none>        443/TCP                         14h
12vmware-system-appplatform-operator-system   vmware-system-appplatform-operator-controller-manager-service    ClusterIP      None          <none>        <none>                          15h
13vmware-system-appplatform-operator-system   vmware-system-psp-operator-k8s-cloud-operator-service            ClusterIP      10.96.1.101   <none>        29002/TCP                       15h
14vmware-system-appplatform-operator-system   vmware-system-psp-operator-service                               ClusterIP      None          <none>        <none>                          15h
15vmware-system-appplatform-operator-system   vmware-system-psp-operator-webhook-service                       ClusterIP      10.96.1.84    <none>        443/TCP                         15h
16vmware-system-capw                          capi-controller-manager-metrics-service                          ClusterIP      10.96.1.47    <none>        9844/TCP                        15h
17vmware-system-capw                          capi-kubeadm-bootstrap-controller-manager-metrics-service        ClusterIP      10.96.1.67    <none>        9845/TCP                        15h
18vmware-system-capw                          capi-kubeadm-bootstrap-webhook-service                           ClusterIP      10.96.0.170   <none>        443/TCP                         15h
19vmware-system-capw                          capi-kubeadm-control-plane-controller-manager-metrics-service    ClusterIP      10.96.1.48    <none>        9848/TCP                        15h
20vmware-system-capw                          capi-kubeadm-control-plane-webhook-service                       ClusterIP      10.96.0.102   <none>        443/TCP                         15h
21vmware-system-capw                          capi-webhook-service                                             ClusterIP      10.96.1.248   <none>        443/TCP                         15h
22vmware-system-capw                          capv-webhook-service                                             ClusterIP      10.96.1.124   <none>        443/TCP                         15h
23vmware-system-capw                          capw-controller-manager-metrics-service                          ClusterIP      10.96.1.252   <none>        9846/TCP                        15h
24vmware-system-capw                          capw-webhook-service                                             ClusterIP      10.96.0.101   <none>        443/TCP                         15h
25vmware-system-cert-manager                  cert-manager                                                     ClusterIP      10.96.0.158   <none>        9402/TCP                        15h
26vmware-system-cert-manager                  cert-manager-webhook                                             ClusterIP      10.96.0.134   <none>        443/TCP                         15h
27vmware-system-csi                           vmware-system-csi-webhook-service                                ClusterIP      10.96.1.130   <none>        443/TCP                         15h
28vmware-system-csi                           vsphere-csi-controller                                           LoadBalancer   10.96.0.108   10.13.90.2    2112:30383/TCP,2113:31530/TCP   15h
29vmware-system-imageregistry                 vmware-system-imageregistry-controller-manager-metrics-service   ClusterIP      10.96.0.47    <none>        9857/TCP                        14h
30vmware-system-imageregistry                 vmware-system-imageregistry-webhook-service                      ClusterIP      10.96.1.108   <none>        443/TCP                         14h
31vmware-system-license-operator              vmware-system-license-operator-webhook-service                   ClusterIP      10.96.1.51    <none>        443/TCP                         15h
32vmware-system-netop                         vmware-system-netop-controller-manager-metrics-service           ClusterIP      10.96.0.234   <none>        9851/TCP                        15h
33vmware-system-nsop                          vmware-system-nsop-webhook-service                               ClusterIP      10.96.1.73    <none>        443/TCP                         15h
34vmware-system-nsx                           nsx-operator                                                     ClusterIP      10.96.0.150   <none>        8093/TCP                        15h
35vmware-system-pinniped                      pinniped-concierge-api                                           ClusterIP      10.96.1.204   <none>        443/TCP                         14h
36vmware-system-pinniped                      pinniped-supervisor                                              ClusterIP      10.96.1.139   <none>        12001/TCP                       14h
37vmware-system-pinniped                      pinniped-supervisor-api                                          ClusterIP      10.96.1.94    <none>        443/TCP                         14h
38vmware-system-tkg                           tanzu-addons-manager-webhook-service                             ClusterIP      10.96.1.18    <none>        443/TCP                         14h
39vmware-system-tkg                           tanzu-featuregates-webhook-service                               ClusterIP      10.96.0.160   <none>        443/TCP                         14h
40vmware-system-tkg                           tkgs-plugin-service                                              ClusterIP      10.96.1.31    <none>        8099/TCP                        14h
41vmware-system-tkg                           tkr-conversion-webhook-service                                   ClusterIP      10.96.0.249   <none>        443/TCP                         14h
42vmware-system-tkg                           tkr-resolver-cluster-webhook-service                             ClusterIP      10.96.1.146   <none>        443/TCP                         14h
43vmware-system-tkg                           vmware-system-tkg-controller-manager-metrics-service             ClusterIP      10.96.1.147   <none>        9847/TCP                        14h
44vmware-system-tkg                           vmware-system-tkg-state-metrics-service                          ClusterIP      10.96.1.174   <none>        8443/TCP                        14h
45vmware-system-tkg                           vmware-system-tkg-webhook-service                                ClusterIP      10.96.0.167   <none>        443/TCP                         14h
46vmware-system-vmop                          vmware-system-vmop-controller-manager-metrics-service            ClusterIP      10.96.0.71    <none>        9848/TCP                        15h
47vmware-system-vmop                          vmware-system-vmop-web-console-validator                         ClusterIP      10.96.0.41    <none>        80/TCP                          15h
48vmware-system-vmop                          vmware-system-vmop-webhook-service                               ClusterIP      10.96.0.182   <none>        443/TCP                         15h
49andreasm@ubuntu02:~/avi_nsxt_wcp$ k get nodes
50NAME                                           STATUS   ROLES                  AGE   VERSION
514234784dcf1b9d8d15d541fab8855b55               Ready    control-plane,master   15h   v1.26.4+vmware.wcp.0
524234e1920ede3cad62bcd3ce8bd2f2dc               Ready    control-plane,master   15h   v1.26.4+vmware.wcp.0
534234f25d5bdc9796ce1e247a4190bb58               Ready    control-plane,master   15h   v1.26.4+vmware.wcp.0
54esx01.cpod-nsxam-stc.az-stc.cloud-garage.net   Ready    agent                  15h   v1.26.4-sph-79b2bd9
55esx02.cpod-nsxam-stc.az-stc.cloud-garage.net   Ready    agent                  15h   v1.26.4-sph-79b2bd9
56esx03.cpod-nsxam-stc.az-stc.cloud-garage.net   Ready    agent                  15h   v1.26.4-sph-79b2bd9
57esx04.cpod-nsxam-stc.az-stc.cloud-garage.net   Ready    agent                  15h   v1.26.4-sph-79b2bd9

And lastly it has also created a DHCP server for the Service Engines Dataplane interfaces.

dhcp-server-se-dataplane

NSX-ALB environment after WCP installation

Two new Virtual Services, one for the the Supervisor Kubernetes API and the other for the CSI controller (monitoring?)

The content of my IPAM profile:

It has added a new usable network there.

It has added a new Data Network in my NSX cloud

Two new Service Engines

Dataplane network for these two new SEs

One new Service Engine Group

Two new Network Profiles

(notice the ingress cidr network profile (VIP) is placed in the global vrf)

One new VRF context

Now I can consume my newly provisioned Supervisor cluster through NSX-ALB - In the next chapters I have provisioned a workload cluster in a different network, different from the default workload network of the Supervisor. Then I will provision some L4 services (serviceType loadBalancer) and Ingress (L7) inside this cluster. Lets see what happens

Creating vSphere Namespaces in other networks - override supervisor workload network

I went ahead and created a new vSphere Namespace with these settings:

Immediately after, it also created some new objects in NSX like a new Tier-1, segments etc similar to what it does before vSphere 8 U2. Then in NSX-ALB new network profiles and VRF context, creating the network profile for the SEs in the new VRF t1-domain-xxxxxx-xxxxx-xxx-xxx-x-ns-stc-1.

The ingress cidr (VIP) network profile vcf-ako-net-domain-c8:5071d9d4-373d-49aa-a202-4c4ed81adc3b-ns-stc-1 was created using the global vrf. If you remember I deselected the option in my IPAM profile Allocate ip in VRF, so it will use the global vrf here also (as in the initial deployment of the Supervisor cluster).

new-network-pfofiles

Now I can deploy my workload cluster in the newly created vSphere namespace.

1andreasm@ubuntu02:~/avi_nsxt_wcp$ k apply -f cluster-1-default.yaml
2cluster.cluster.x-k8s.io/cluster-1 created

In NSX-ALB shortly after a new VS is created

The already exisiting Service Engines has been reconfigured adding a second dataplane interface, using same subnet, but in two different VRF contexts. So it should work just fine.

Now I just need to wait for the nodes to be provisioned.

In the meantime I checked the T1 created for this workload cluster and it has been so kind to create the static routes for me there also:

Virtual service is now green, so the control plane vm is up and running.

Cluster ready, and I can test some services from it.

1andreasm@ubuntu02:~/avi_nsxt_wcp$ k get nodes
2NAME                                            STATUS   ROLES           AGE     VERSION
3cluster-1-f82lv-fdvw8                           Ready    control-plane   12m     v1.26.5+vmware.2-fips.1
4cluster-1-node-pool-01-tb4tw-555756bd56-76qv6   Ready    <none>          8m39s   v1.26.5+vmware.2-fips.1
5cluster-1-node-pool-01-tb4tw-555756bd56-klgcs   Ready    <none>          8m39s   v1.26.5+vmware.2-fips.1

L4 services inside the workload clusters

This test is fairly straightforward. I have deployed my test application Yelb. This creates a web-frontend and exposes it via serviceType loadBalancer. As soon as I deploy it NSX-ALB will create the virtual service using the same VIP range or Ingress range defined in the vSphere Namespace:

VIP-range

 1andreasm@ubuntu02:~/examples$ k get pods -n yelb
 2NAME                              READY   STATUS    RESTARTS   AGE
 3redis-server-56d97cc8c-f42fr      1/1     Running   0          7m2s
 4yelb-appserver-65855b7ffd-p74r8   1/1     Running   0          7m2s
 5yelb-db-6f78dc6f8f-qbj2v          1/1     Running   0          7m2s
 6yelb-ui-5c5b8d8887-bbwxl          1/1     Running   0          79s
 7andreasm@ubuntu02:~/examples$ k get svc -n yelb
 8NAME             TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE
 9redis-server     ClusterIP      20.10.173.143   <none>        6379/TCP       7m10s
10yelb-appserver   ClusterIP      20.10.215.161   <none>        4567/TCP       7m9s
11yelb-db          ClusterIP      20.10.208.242   <none>        5432/TCP       7m9s
12yelb-ui          LoadBalancer   20.10.120.42    10.13.92.2    80:30917/TCP   87s

This works fine, now my diagram can look like this:

kube-api-and-l4

Next step is to deploy an Ingress

L7 services inside the workload clusters

If I check my newly deployed workload cluster, there is no AKO pod running.

 1andreasm@ubuntu02:~/avi_nsxt_wcp$ k get pods -A
 2NAMESPACE                      NAME                                                            READY   STATUS      RESTARTS        AGE
 3kube-system                    antrea-agent-4fzbm                                              2/2     Running     0               7m15s
 4kube-system                    antrea-agent-v94js                                              2/2     Running     0               7m16s
 5kube-system                    antrea-agent-w6fqr                                              2/2     Running     0               8m20s
 6kube-system                    antrea-controller-797ffdc4df-2p4km                              1/1     Running     0               8m20s
 7kube-system                    coredns-794978f977-4pjvl                                        1/1     Running     0               7m8s
 8kube-system                    coredns-794978f977-xz76z                                        1/1     Running     0               10m
 9kube-system                    docker-registry-cluster-1-f82lv-fdvw8                           1/1     Running     0               11m
10kube-system                    docker-registry-cluster-1-node-pool-01-tb4tw-555756bd56-76qv6   1/1     Running     0               5m59s
11kube-system                    docker-registry-cluster-1-node-pool-01-tb4tw-555756bd56-klgcs   1/1     Running     0               5m58s
12kube-system                    etcd-cluster-1-f82lv-fdvw8                                      1/1     Running     0               11m
13kube-system                    kube-apiserver-cluster-1-f82lv-fdvw8                            1/1     Running     0               11m
14kube-system                    kube-controller-manager-cluster-1-f82lv-fdvw8                   1/1     Running     0               11m
15kube-system                    kube-proxy-2mch7                                                1/1     Running     0               7m16s
16kube-system                    kube-proxy-dd2hm                                                1/1     Running     0               7m15s
17kube-system                    kube-proxy-pmc2c                                                1/1     Running     0               11m
18kube-system                    kube-scheduler-cluster-1-f82lv-fdvw8                            1/1     Running     0               11m
19kube-system                    metrics-server-6cdbbbf775-kqbfm                                 1/1     Running     0               8m15s
20kube-system                    snapshot-controller-59d996bd4c-m7dqv                            1/1     Running     0               8m23s
21secretgen-controller           secretgen-controller-b68787489-js8dj                            1/1     Running     0               8m5s
22tkg-system                     kapp-controller-b4dfc4659-kggg7                                 2/2     Running     0               9m
23tkg-system                     tanzu-capabilities-controller-manager-7c8dc68b84-7mv9v          1/1     Running     0               7m38s
24vmware-system-antrea           register-placeholder-9n4kk                                      0/1     Completed   0               8m20s
25vmware-system-auth             guest-cluster-auth-svc-rxt9v                                    1/1     Running     0               7m18s
26vmware-system-cloud-provider   guest-cluster-cloud-provider-844cdc6ffc-54957                   1/1     Running     0               8m25s
27vmware-system-csi              vsphere-csi-controller-7db59f4569-lxzkc                         7/7     Running     0               8m23s
28vmware-system-csi              vsphere-csi-node-2pmlj                                          3/3     Running     3 (6m2s ago)    7m15s
29vmware-system-csi              vsphere-csi-node-9kr82                                          3/3     Running     4 (6m10s ago)   8m23s
30vmware-system-csi              vsphere-csi-node-9pxc8                                          3/3     Running     3 (6m2s ago)    7m16s

So what will happen if I try to deploy an Ingress?

Nothing, as there is no IngressClass available.

1andreasm@ubuntu02:~/examples$ k get ingressclasses.networking.k8s.io
2No resources found

So this means we still need to deploy AKO to be able to use Ingress. I have a post of how that is done here where NSX is providing the IP addresses for the Ingress, but I expect now it will be NSX-ALB providing the IP addresses using the ingress cidr configured in the namespace. So the only difference is NSX-ALB is providing it instead of NSX.

This concludes this post.

Credits

A big thanks goes to Tom Schwaller, Container Networking TPM @VMware, (Twitter/X handle @tom_schwaller), for getting me started on this post, and for providing useful insights on the requirements for getting this to work.