vRealize Network Insight 5.1 Released

VMware has released a vRealize Network Insight 5.1. vRealize Network Insight (vRNI) supports a large number of vendors which are switch/router vendors. We name a few like Dell, Cisco (ACI, Nexus), Arista, Juniper. Also firewalls such as Palo Alto, Checkpoint, Cisco ASA, Fortinet.

vRNI has also Azure support which gives you visibility on application dependency mappings, flow analysis (inter, intra and hybrid VNET, NSG, ASG, VM, Subnet etc).

vRNI allows Accelerate micro-segmentation deployment and troubleshoots security for SDDC, native AWS and hybrid environments.

There are 3 versions of vRNI. Advanced, Enterprise and Cloud Service. The Advanced does not have a possibility to plan security for AWS, AWS visibility and Troubleshooting, PCI compliance dashboard, Netwlow from physical devices, Configurable and extended retention period of data or Infoblox integration.

 

From the Release Notes:

VMware SD-WAN by VeloCloud®

  • Analytics support for VMware SD-WAN: Support Threshold-based Analytics for various metrics for VMware SD-WAN entities including Edge, Link, and Edge-Application.
  • Pre-SD-WAN assessment report:
    • WAN Uplink/downlink assessment of non-SD-WAN (Cisco ASR/ISR) deployment.
    • Generate a report (as pdf) which includes ROI computation, savings, and recommendations if a customer decides to deploy the Velocloud SD-WAN solution.
    • The report also includes traffic visibility of the current WAN deployments.

Application Discovery and Troubleshooting

  • Discover applications using a new ‘Advanced’ mode, that supports NSX Security Tag and Security Groups.
  • Summary Panel: View a summary of key application-related information (events, flow count, health, incoming/outgoing traffic, countries accessing the application, member count, etc.) at the top of the application dashboard.
  • Troubleshoot Application: Filter and troubleshoot
    • Degraded Flows: Flows experiencing abnormal latencies.
    • Unprotected Flows: Flows that are not protected by any firewall rules.

VMware NSX-TTM

  • NSX-T Manager topology and dashboard to give you quick insights into your NSX-T deployment
  • Support for new out of the box NSX-T events
  • Monitor BGP status on your NSX-T deployments

VMware CloudTM on AWS​

  • VMware Cloud on AWS dashboard enhancements
    • VMware Cloud on AWS SDDC object introduced as a part of vRealize Network Insight search and VMware Cloud on AWS dashboard​
    • VMware Cloud on AWS SDDC list view with enriched metadata
    • New entities added to VMware Cloud on AWS SDDC dashboard – VMware Cloud on AWS SDDC overview​, Network Traffic and Events, Top Talkers​, VM, and Host limit-based alerts
  • VMware Cloud on AWS Edge Gateway Firewall rule visibility
  • Proactive alerting in VMware Cloud on AWS
    • Maximum number of VMs in SDDC
    • Maximum number of Hosts in SDDC

Containers

  • Kubernetes Service topology and dashboard to give you quick insights into your Kubernetes Services
  • New out of the box Kubernetes events

Other Enhancements

  • 3rd Party: Support for Arista HW VTEP in VM-VM path
  • 3rd Party: Support for VM-VM Path topologies using L3 NAT (with Fortinet)
  • Support for up to 4 SNMP targets.
  • Operate in an air-gapped network without Internet connectivity
  • Patch vRealize Network Insight from UI
  • Usage of Foundation DB, a distributed database that removes requirements for disks required for the first platform node. Foundation DB brings an additional resiliency in vRealize Network Insight deployments.
  • Simplified support and consoleuser default password.

Links:

VMware vRealize Network Insight 5.1.0 Download

NSX-T Data Center 2.5 -What’s in the Release Notes

What’s New

NSX-T Data Center 2.5 provides a variety of new features to provide new functionality for virtualized networking and security for private, public, and hybrid clouds. Highlights include enhancements to intent-based networking user interface, context-aware firewall, guest and network introspection features, IPv6 support, highly-available cluster management, profile-based NSX installation for vSphere compute clusters, and enhancements to migration coordinator for migrating from NSX Data Center for vSphere to NSX-T Data Center.

NSX Intelligence

NSX-T Data Center 2.5 introduces NSX Intelligence v1.0, a new NSX analytics component. NSX Intelligence provides a user interface via a single management pane within NSX Manager, and provides the following features:

  • Close to real-time flow information for workloads in your environment.
  • NSX Intelligence correlates live or historic flows, user configurations, and workload inventory.
  • Ability to view past information about flows, user configurations, and workload inventory.
  • Automated micro-segmentation planning by recommending firewall rules, groups, and services.

Container API Support

New API support is available for container inventory. See the API documentation.

L2 Networking

  • Enhancements for the Edge Bridge – The Edge bridge now allows attaching the same segment to multiple bridge profiles, thus providing the ability to bridge a segment multiple times to VLANs in the physical infrastructure. This new functionality supersedes and deprecates the original ESXi bridge in previous versions of NSX-T Data Center. Caution: Use this feature at your risk. It introduces the risk of creating a bridging loop by bridging the same segment twice to the same L2 domain in the physical network. There is no loop mitigation mechanism.
  • MTU/VLAN Health Check – From an operations point of view, network connectivity issues caused by configuration errors are often difficult to identify. Common scenarios include ones wherein virtual network admins using NSX Manager while physical network admins take management ownership of physical network switches.
    • VLAN Health Check – Checks whether N-VDS VLAN settings match trunk port configuration on the adjacent physical switch ports.
    • MTU Health Check – Checks whether the physical access switch port MTU setting based on per VLAN matches the N-VDS MTU setting.
  • Guest Inter-VLAN Tagging – The Enhanced Datapath N-VDS enables users to map guest VLAN Tag to a segment. This capability overcomes the limitation of 10 vNICs per VM and allows guest VLAN tagged traffic (mapped to different segments) to be routed by the NSX infrastructure.

L3 Networking

  • Tier-1 Placement Inside Edge Cluster Based on Failure Domain – Enables NSX-T to automatically place Tier-1 gateways based on failure domains defined by the user. This increases the reliability of Tier-1 gateways across availability zones, racks, or hosts, even when using automatic Tier-1 gateway placement.
  • Asymmetric Load Sharing After Router Failure in ECMP Topology – On active/active Tier-0 gateway when one faulty service router was going down another router was taking over the faulty router traffic doubling the traffic going through the service router. After 30 minutes of a router failure, the faulty router IP address is removed from the list of next-hops avoiding the additional traffic to a specific router .
  • Get BGP Advertised and Received Routes Per Peer through API and UI – Simplifies BGP operations by avoiding CLI usage to verify the routes received and sent to BGP peers.
  • BGP Large Community Support – Offers the option to use communities in conjunction with 4-byte ASN as defined in RFC8092.
  • BGP Graceful Restart Helper Mode Option Per Peer – Offers the option for Tier-0 gateway to help maintain router for northbound physical routers with redundant control plane without compromising on the failover time across Tier-0 routers.
  • DHCP relay on CSP – Extends the support of DHCP relay to CSP port, offering DHCP relay to endpoints connected to NSX-T through a VLAN.
  • Bulk API to Create Multiple NAT Rules – Enhances the existing NAT API to bundle the creation of a large number of NAT rules into a single API call.

Edge Platform

  • Support Mellanox ConnectX-4 and ConnectX-4 LX on Bare Metal Edge Node – Bare Metal Edge nodes now support Mellanox ConnectX-4 and ConnectX-4 LX physical NICs in 10/25/40/50/100 Gbps.
  • Bare Metal Edge PNIC Management – Provides the option to select the physical NICs to be used as dataplane NICs (fastpath). It also increases the number of physical NICs supported on the Bare Metal Edge node from 8 to 16 PNICs.

Enhanced IPv6 Support

NSX-T 2.5 continues to enhance the IPv6 routing/forwarding feature-set. This includes the support for:

  • IPv6 SLAAC (Stateless Address Autoconfiguration), automatically providing IPv6 addresses to virtual machines.
  • IPv6 Router Advertisement, NSX-T gateway provides IPv6 parameters through Router Advertisement.
  • IPv6 DAD, NSX-T gateways detects duplicate IPv6 address allocation.

Firewall Improvements

Layer-7 AppID Support

NSX-T 2.5 adds more Layer-7 capabilities for distributed and gateway firewall. This includes the support for:

  • Layer-7 AppID support for distributed firewall on KVM.
  • Layer-7 AppID support for gateway firewall.
  • Multiple Layer-7 AppID configuration in a single firewall rule.

FQDN/URL Filtering Enhancements

NSX-T 2.5 has minor enhancements to FQDN filtering support, including:

  • Configuring TTL timers for DNS entries.
  • Support for workloads running on KVM hypervisor.

Firewall Operations have been enhanced with the following features:

  • Autosave Configuration & Rollback Feature – The system creates a copy of the configuration when published. This configuration can be re-deployed to rollback to an existing state.
  • Manual Drafts – Users can now save drafts of their rules before they publish those rulesets for enforcement. Users can stage the rules in manual drafts. The system allows you to have multiple users work on the same draft with a locking mechanism to disable overriding of rules from different users.
  • Session Timers – Users can configure session timers for TCP, UDP and ICMP sessions.
  • Flood Protection – Both distributed firewall and gateway firewall can have SynFlood protection. Users can provide thresholds to alert, log and drop traffic to make it custom workflows.
  • System auto-generates two groups when NSX LoadBalancer is created and virtual servers are deployed. One group contains the server pool while the other group contains virtual server IP. These groups can be used on distributed firewall or gateway firewall to allow or deny traffic by firewall admins. These groups track the NSX load balancer config changes.
  • The number of IP addresses detected per VM – vNIC has been increased from 128 to 256 IP addresses.

Identity Firewall

  • With NSX-T 2.5, we support Active Directory Servers deployed on Windows 2016.
  • We support the Identity Firewall for Windows Server workloads without Terminal Services enabled. This will allow customers to strictly control the lateral movement of administrators from one server to another.

Service Insertion

  • Packet Copy Support – In addition to redirecting traffic through a service, NSX-T now supports the Network Monitoring use case, in which a copy of packets is forwarded to a partner Service Virtual Machine (SVM), allowing inspection, monitoring or collection of statistics while the original packet does not pass through the network monitoring service.
  • Automatic Host-based Partner SVM Deployment – As of NSX-T 2.5, two modes of Partner SVM deployment are supported; clustered deployment in which Service Virtual Machines are deployed on a dedicated vSphere (Service) Cluster and Host-Based in which one Service Virtual Machine per service is deployed on each Compute Host in a particular cluster. In this mode, when a new compute host is added to a cluster, the appropriate SVMs are automatically deployed.
  • Notification Support for North-South Service Insertion – NSX-T 2.4 introduced the notification framework for East-West Service Insertion, allowing partner services to automatically receive notifications upon relevant changes such as dynamic group updates. With NSX-T 2.5, this notification framework has also been extended to N-S Service Insertion. Partners can leverage this mechanism in order to allow customers to use dynamic NSX groups (i.e. based on Tags, OS, VM Name) in the partner policy.
  • Additional Troubleshooting and Visualization Features – With NSX-T 2.5, several serviceability enhancements have been made to allow for better troubleshooting of Service Insertion related issues. This includes the ability to verify the runtime status of a Service Instance, the ability to fetch available Service Paths through the API and the inclusion of Service Insertion related logs in the support bundle.

Endpoint Protection (Guest Introspection)

  • Linux Support – Support for Linux-based operating systems with Endpoint Protection. Please see the NSX-T Administration Guide for supported Linux operating systems for Guest Introspection.
  • Endpoint Protection Dashboard – Endpoint Protection dashboard for visibility and monitoring the configuration status of protected and unprotected VMs, issues with Host agent and service VMs, and VMs configured with the file introspection driver that was installed as part of the VMware Tools installation.
  • Monitoring Dashboard – To monitor the partner service deployment status across clusters in the system .

Load Balancing

  • API to Retrieve the Status on Edge Capacity for Load Balancers – New API calls have been added to allow the admin to monitor the Edge capacity in terms of load balancing instances.
  • Intelligent Selection of Health Check IP Address – When SNAT IP-list is configured, the first IP address in the list is going to be used for health monitoring instead of the uplink IP address of a Tier-1 Gateway. The IP address can be the same as the Virtual Server IP address. This enhancement allows the load balancer to use a single IP address for both source-nat and health monitoring.
  • Load Balancer Logging Enhancement – With this enhancement, the load balancer can generate a rich log message per Virtual Server for monitoring. For example, the Virtual Server access log includes not only the client IP address but also a pool member IP address.
  • Persistent Enhancement in LB Rules – A new action called “Persist” is introduced in LB rules. The Persist action enables the load balancer to provide application persistency based on a cookie set by a pool member.
  • LB Fits – A small LB instance can fit into a small Edge VM. A medium LB instance can fit into a medium Edge VM. Previously, the small Edge VM did not support load balancing services because the size of an Edge VM had to be bigger than the size of an LB instance.
  • VS/Pool/Member Statistics – All LB related statistics are available in simplified interface. Previously, the information was only available in Advanced Networking and Security interface.
  • ECC (Elliptical Curve Certificate) Support for SSL Termination – EC certificates can be used for increased SSL performance.
  • FIPS Knob – There is a global setting via API for FIPS compliance for load balancers. By default, the setting is turned off to improve performance.

VPN

  • IPsec VPN Support on Tier-1 Gateway – IPsec VPN can be deployed and terminated on Tier-1 gateway for better tenant isolation and scalability. Previously, it was supported on only Tier-0 gateway.
  • VLAN Support for Layer-2 VPN on NSX-managed Edge – With this enhancement, VLAN-backed segments can be extended. Previously, only logical segments were supported for Layer-2 extension. This includes VLAN Trunking support enabling multiple VLANs to be extended on one Edge Interface and Layer-2 VPN session.
  • TCP MSS Clamping for IPsec VPN – TCP MSS Clamping allows the admin to enforce the MSS value of all TCP connections to avoid packet fragmentation.
  • ECC (Elliptical Curve Certificate) Support for IPsec VPN – The EC certificate is required to enable various IPsec compliance suites, such as CNSA, UK Prime, etc.
  • Easy Button for Compliance Suite Configuration – CNSA, Suite-B-GCM, Suite-B-GMAC, Prime, Foundation, and FIPS can be configured with a single click in the UI or a single API call.

Automation, OpenStack and other CMP

  • Expanded OpenStack Release Support – Now includes the Stein and Rocky releases.
  • OpenStack Neutron Plugin supporting Policy API – In addition to existing plugin supporting management API, we now offer an OpenStack Neutron plugin consuming the new NSX-T Policy API. This plugin supports IPv6 for Layer-2, L3, firewall and SLAAC.
  • OpenStack Neutron Router Optimization – The plugin now optimizes the OpenStack Neutron Router by managing the creation/deletion of the service router dynamically. This allows a customer to have only a distributed router when no services are configured and one as soon as the services are added, all managed by the plugin.
  • OpenStack Neutron Plugin Layer-2 Bridge – The Layer-2 bridge configured from OpenStack is now configured on the Edge Cluster and not on the ESXi cluster.
  • OpenStack Octavia Support – In addition to LBaaSv2, the OpenStack Neutron Plugin supports Octavia as a way to support Load Balancing.
    For more details please see the VMware NSX-T Data Center 2.5 Plugin for OpenStack Neutron Release Notes.

NSX Cloud

  • Addition of a New Mode of Operation – NSX Cloud will now have two modes of operation, this officially makes NSX Cloud the only Hybrid Cloud solution in the market to support agented and agentless modes of operation.
    • NSX Enforced Mode (Agented) – Provides a “Consistent” policy framework between on-premises and any public cloud. NSX Policy enforcement is done with NSX tools which are installed in every workload. This provides VM level granularity and all tagged VMs will be managed by NSX. This mode will overcome the differences/limitations of individual public cloud providers and provide a consistent policy framework between on-premises and public cloud workload.
    • Native Cloud Enforced Mode (Agentless) – Provides a “Common” policy framework between on-premises and any public cloud. This mode does not require the installation of NSX tools in the workloads. NSX Security Policies are converted into the Native Cloud providers security constructs. Hence, all the scale and feature limitations of the chosen public cloud are applicable. The granularity of control is at the VPC/VPNET level and every workload inside a managed VPC/VNET will be managed by NSX unless it is whitelisted.
      Both modes will provide Dynamic Group membership and a rich set of abstractions for nsx group membership criteria.
  • Support for Visibility and Security of Public Cloud Native Services from NSX Cloud – From this release, it will be possible to program the security groups of Native SaaS services in Azure and AWS which have a local VPC/VNET endpoint and a security group associated with it. The primary idea is to discover and secure cloud native service endpoints with user-specified rules on NSX policy. The following services will be supported in AWS (ELB, RDS & DynamoDB) and Azure (Azure Storage, Azure LB, Azure SQL Server & CosmoDB) in this release. Future NSX-T releases will add more support for more services.
  • New OS support:
    • Support for Windows Server 2019
    • Windows 10 v1809
    • Support for Ubuntu 18.04
  • Enhanced Quarantine Policy and VM White-listing – Starting with NSX 2.5, NSX Cloud allows users to whitelist VMs from the CSM interface. Once whitelisted, cloud security groups of such VMs are not be managed by NSX, and users can put the VMs in whatever cloud security groups they want.
  • Enhanced Error Reporting on CSM Interface – Enables quicker troubleshooting.

Operations

  • Support of vSphere HA for the NSX Manager(s) – The NSX management cluster can now be protected by vSphere HA. This allows one node of the NSX management cluster to be recovered if the host running it fails. It also allows for the entire NSX management cluster to be recovered to an alternate site if there is a site-level failure. Please see the NSX-T Installation Guide for details on supported scenarios.
  • Capacity Dashboard Improvements – New and improved metrics to the capacity dashboard show the number of objects a customer has configured relative to the maximum supported in the product. For a complete list of configuration maximums for NSX-T Data Center, see the VMware Configuration Maximums Tool.
  • Support for vSphere Lockdown Mode – Enable more deployment options for customers by providing the ability to install, upgrade and operate NSX-T in a vSphere lockdown mode environment.
  • Logging Enhancement – Reduce service impact during troubleshooting by enabling dynamic change of log levels via the NSX command line interface for NSX user space agents.
  • SNMPv3 Support – Enhanced security compliance by adding support for configuring SNMPv3 for NSX Edge and Manager appliance.
  • New Traceflow Capability for Troubleshooting VM Address Resolution Issues – Added support for injecting ARP/NDP packets via Traceflow to detect connectivity issues while doing address resolution for an IP destination.
  • Upgrade Order Change – When upgrading to NSX-T 2.5, the new upgrade order is Edge-component upgrade before Host component upgrade. This enhancement provides significant benefits when upgrading the cloud infrastructure by allowing optimizations to reduce the overall maintenance window.
  • Log Insight Content Pack Enhancement – Added support for out-of-box log alerts with the new NSX-T Content Pack compatible with NSX-T 2.5.

Platform Security

  • FIPS – Users can now generate FIPS compliance reports. including the ability to configure and manage their NSX deployments in FIPS-compliant mode. Cryptographic modules are validated per the FIPS standards, offering security assurance for customers who want to be compliant per federal regulations or operate NSX in a secure manner that adheres to prescribed FIPS standards. With noted exceptions, all cryptographic modules in NSX-T 2.5 are FIPS certified. To view granted certifications for FIPS-validated modules, see https://www.vmware.com/security/certifications/fips.html.
  • Enhancements to Password Management – Users can now extend the password expiry duration (day-count) since the last password change even after upgrade. Thirty-day expiry warnings and password expiry notifications now appear in the interface, CLI, and syslogs.

Support for Single Cluster Design

Support of single cluster designs with fully collapsed Edge+Management+Compute VMs, powered by a single N-VDS, in a cluster with a minimum of four hosts. The typical reference designs for VxRail and other cloud provider host solution prescribe 4x10G pNICs with two host switches. One switch is dedicated to Edge+Management (VDS), whereas the other one is dedicated to compute VMs (N-VDS). Two host-switches effectively separate the management traffic from the compute traffic. However, with the trending economics of 10 and 25G, many small data center and cloud provider customers are standardizing on two pNICs host. Using this form factor, small data centers and cloud provider customers can build an NSX-T based solution with single N-VDS, powering all the components with two pNICs.

NSX Data Center for vSphere to NSX-T Data Center Migration

  • Migration Coordinator Enhancements – The Migration Coordinator has several usability enhancements that improve the workflow of the process required to migrate from NSX Data Center for vSphere to NSX-T Data Center, including improvements to providing user feedback during the migration.

Compatibility and System Requirements

For compatibility and system requirements information, see the NSX-T Data Center Installation Guide.

General Behavior Changes

NSX-T Data Center System Communication Port Changes

Starting with NSX-T Data Center 2.5, the NSX Messaging channel TCP port from all Transport and Edge nodes to NSX Managers has changed to TCP port 1234 from port 5671. With this change, make sure all NSX-T Transport and Edge nodes can communicate on both TCP ports 1234 to NSX Managers and TCP port 1235 to NSX Controllers before you upgrade to NSX-T Data Center 2.5. Also make sure to keep port 5671 open during the upgrade process.

L2 Networking

As a result of the enhancements for Layer-2 bridges, the ESXi bridge is deprecated. NSX-T was initially introduced with the capability of dedicating an ESXi host as a bridge to extend an overlay segment to a VLAN. This model is deprecated as of this release because the new Edge bridge supersedes it in term of features, does not require a dedicated ESXi host, and benefits from the optimized data path of the Edge node. See “What’s New” for more information.

API Deprecations and Behavior Changes

Transport Node Template APIs are deprecated in this release. It is recommended that you use Transport Node Profiles APIs instead. See the API Guide for the list of deprecated types and methods.

API and CLI Resources

See code.vmware.com to use the NSX-T Data Center APIs or CLIs for automation.

The API documentation is available from the API Reference tab. The CLI documentation is available from the Documentation tab.

Available Languages

NSX-T Data Center has been localized into multiple languages: English, German, French, Japanese, Simplified Chinese, Korean, Traditional Chinese, and Spanish. Because NSX-T Data Center localization utilizes the browser language settings, ensure that your settings match the desired language.

Document Revision History

19 September 2019. First edition.
23 September 2019. Added Known Issues 2424818 and 2419246. Added Resolved Issues 2364756, 2406018, and 2383328.
24 September 2019. Updated What’s New items.
03 October 2019. Added Resolved Issue 2313673.
12 November 2019. Added Known Issues 2362688 and 2436302. Corrected Issue 2282798 by moving it to Resolved.

Resolved Issues

  • Fixed Issue 2288774 – Segment port gives realization error due to tags exceeding 30 (erroneously).User input incorrectly tries to apply more than 30 tags. However, the Policy workflow does not properly validate/reject the user input and allows the configuration. Then Policy then shows an alarm with the proper error message that the user should not use more than 30 tags. At that point the user can correct this issue.
  • Fixed Issue 2334442 – User does not have permission to edit or delete created objects after admin user renamed.User does not have permission to edit or delete created objects after admin user is renamed. Unable to rename admin/auditor users.
  • Fixed Issue 2256709 – Instant clone VM or VM reverted from a snapshot loses AV protection briefly during vMotion.Snapshot of a VM is reverted and migrates the VM to another host. Partner console doesn’t show AV protection for migrated instant clone VM. There is a brief loss of AV protection.
  • Fixed Issue 2261431 – Filtered list of datastores is required depending upon the other deployment parameters.Appropriate error on UI seen if incorrect option was selected. Customer can delete this deployment and create a new one to recover from error.
  • Fixed Issue 2274988 – Service chains do not support consecutive service profiles from the same service.Traffic does not traverse a service chain and it gets dropped whenever the chain has two consecutive service profiles belonging to the same service.
  • Fixed Issue 2279249 – Instant clone VM loses AV protection briefly during vMotion.Instant clone VM migrated from one host to another. Immediately after migration, eicar file is left behind on the VM. Brief loss of AV protection.
  • Fixed Issue 2292116 – IPFIX L2 applied to with CIDR-based group of IP addresses not listed on UI when group is created via the IPFIX L2 page.If you try to create a group of IP addresses from Applied to dialog and enter wrong IP address or CIDR in the Set Members dialog box, those members are not listed under groups. You have to edit that group again to enter valid IP addresses.
  • Fixed Issue 2268406 – Tag Anchor dialog box doesn’t show all tags when maximum number of tags are added.Tag Anchor dialog box doesn’t show all tags when maximum number of tags are added, and cannot be resized or scrolled through. However, the user can still view all tags in the Summary page. No data is lost.
  • Fixed Issue 2282798 – Host registration may fail when too many requests/hosts try to register with the NSX Manager simultaneously.This issue causes the fabric node to be in a FAILED state. The Fabric node status API call shows “Client has not responded to heartbeats yet”. The /etc/vmware/nsx-mpa/mpaconfig.json file on the host is also empty.
  • Fixed Issue 2383867 – Log bundle collection fails for one of the Management Plane nodes.Log collection process experiences a failure when copying support bundle to remote server.
  • Fixed Issue 2332397 – API allows creation of DFW policies in nonexistent domain.After creating such a policy on a nonexistent domain, the interface becomes unresponsive when user opens up a DFW security tab. The relevant log is /var/log/policy/policy.log.
  • Fixed Issue 2410818 – After upgrading to 2.4.2, virtual servers created in NSX-T 2.3.x may stop working after more virtual servers are created. In some deployments, virtual servers created in version 2.3.x stop working after upgrading to version 2.4.2 and after more virtual servers were created.
  • Fixed Issue 2310650 – Interface shows “Request timed out” error message.Multiple pages on interface shows the following message: “Request timed out. This may occur when system is under load or running low on resources”
  • Fixed Issue 2314537 – Connection status is down after vCenter certificate and thumbprint update.No new updates from vCenter sync with NSX and all on-demand queries to fetch data from vCenter will fail. Users cannot deploy new Edge/Service VMs. Users cannot prepare new clusters or hosts added in the vCenter. Log locations: /var/log/cm-inventory/cm-inventory.log and /var/log/proton/nsxapi.log on the NSX Manager node.
  • Fixed Issue 2316943 – Workload unprotected briefly during vMotion.VMware Tools takes a few seconds to report correct computer name for VM after vMotion. As a result, VMs added to NSGroups using computer name are unprotected for a few seconds after vMotion.
  • Fixed Issue 2318525 – The next-hop IPv6 routes as the eBGP peer’s IP address gets changed to its own IP. In case of eBGP IP4 sessions, advertised IPv4 routes that have their eBGP peer as the next hop, the next hop of the route is NOT changed at the sender side to its own IP address. This works for IPv4, but for IPv6 sessions, the next hop of the route is changed at the sender side to its own IP address. This behavior can result in route loops.
  • Fixed Issue 2320147 – VTEP missing on the affected host.If a LogSwitchStateMsg is removed and added in the same transaction and this operation is processed by the central control plane before management plane has sent the Logical Switch, the Logical switch state will not be updated. As a result, traffic cannot flow to or from the missing VTEP.
  • Fixed Issue 2320855 – New VM security tag is not created if user doesn’t click Add/Check button.Interface issue. If a user adds a new security tag to a policy object or inventory and clicks Save without first clicking the Add/Check button next to the tag-scope pair field, the new tag pair is not created.
  • Fixed Issue 2331683 – Add-Load-balancer form on Advance UI not showing updated capacity of version 2.4.When add-load-balancer form is opened, the form-factor-capacity shown on the Advance UI is not updated as per 2.4 version. The capacity shown is from the previous version.
  • Fixed Issue 2295819 – L2 bridge stuck in “Stopped” state even though Edge VM is Active and PNIC is UP.L2 bridge may be stuck in “Stopped” state even though the Edge VM is Active and the PNIC that backs the L2 bridge port is UP. This is because the Edge LCP fails to update the PNIC status in its local cache, thereby assuming that the PNIC is down.
  • Fixed Issue 2243415 – Customer unable to deploy EPP service using Logical Switch (as a management network).On the EPP deployment screen, the user cannot see a logical switch in the network selection control. If the API is used directly with logical switch mentioned as management network, user will see the following error: “Specified Network not accessible for service deployment.”
  • Fixed Issue 2364756 – Profile realization fails due to duplicate priority.On scale setups, when user associated vRNI with NSX IPFIX, the profile would not realize on the management plane and would through realization errors.
  • Fixed issue 2392093 – Traffic drops due to RPF-Check.RPF-Check may result in dropped traffic if traffic is hair-pinned through a T0 downlink, and Tier0 and Tier1 routers are on the same Edge Node.
  • Fixed issue 2307551 – NSX-T Host may lose management network connectivity when migrating all pNICs to N-VDS. The issue results from the host migration retry removing all pNICs in the N-VDS that has vmk0 configured. The first host migration migrated all pNICs and vmk0 into the N-VDS but failed afterward. When you retry migration, all pNICs are removed from the N-VDS. As a result, users cannot access the host through the network; all VMs in the host also lose network connectivity, rendering their services unreachable.
  • Fixed Issue 2369792 – CBM process repeatedly crashes due to CBM process memory bloat.CSM and CBM processes on Cloud Service Manager appliance fail database compaction. As a result, CBM process memory bloat causes the CBM process to repeatedly crash.
  • Fixed Issue 2361892 – The NSX Edge appliance experiences a memory leak, leading to process crash/restart.Over an extended period of time, the NSX Edge appliance may experience a memory leak due to repeated rule lookup, leading to process crash/restarts. A memory leak was detected every time a rule lookup was executed.  When the flow cache is cleared, the VIF interface is not removed, causing a building in memory.
  • Fixed Issue 2364529 – Load balancer memory leak after reconfiguration.NSX Load balancer might leak memory upon consecutive/repetitive configuration events, resulting in nginx process core dump.
  • Fixed Issue 2378876 – PSOD on ESXi hosts with errors: “Usage error in dlmalloc” and “PF Exception 14 in world 3916803:VSIP PF Purg IP”.ESXi crashed (PSOD) after running traffic for a few days. No other symptoms were observed prior to the crash. Issue was ultimately identified in ALG traffic (FTP, Sunrpc, Oracle, Dcerpc, tftp) where nonatomized increment counter led to race conditions, corrupting the ALG tree structure.
  • Fixed Issue 2384922 – BGPD consumes 100% CPU usage on Edge node.BGPD process on NSX-T Edge may consume 100% CPU when it has several open sessions with VTYSH.
  • Fixed Issue 2386738 – NAT rules ignored on traffic over LINKED port.NAT services are not enabled on LINKED router port type connecting Tier-0 and Tier-1 logical routers.
  • Fixed Issue 2363618 – VMware Identity Manager users unable to access Policy pages in NSX Manager dashboard.Users with roles assigned to Group permissions in VMware Identity Manager are unable to access Policy pages in the NSX Manager dashboard. Permissions from group assignment are ignored.
  • Fixed Issue 2298274 – Policy Group can be created/updated with invalid or partial domain name through REST API.The interface permitted creation of groups with identity expressions containing invalid Active Directory group or individual groups members for a single valid content. However, each member is valid only if it has exactly one LDAP group associated to the domain name. As a result, such groups created in a previous version of NSX-T, this error will not be flagged in the upgrade process, allowing the invalid groups to persist in subsequent releases. Fixed in 2.5.
  • Fixed Issue 2317147 – Users cannot see effective VMs for a group whose membership is based on IP or MAC addresses.If a user creates a group with only IP or MAC addresses in the group, no VMs are listed when effective membership for that group is called from the API. There is no functional impact. Policy properly creates an NSGroup on the management plane, and the list of IP and MAC addresses is directly sent to central control plane.
  • Fixed Issue 2327201 – Updates of VMs on KVM hypervisors not immediately synchronized.VM updates on KVM hypervisors may take a couple of hours to synchronize on NSX-T. As a result, new VMs created on KVM hypervisors cannot be added to NSGroups, no firewall rules can be applied on those VMs, upgrade of KVM hypervisor is not possible because the VM power status is not updated.
  • Fixed Issue 2329443 – Control cluster is not getting initialized due to forcesync timeout.The control cluster is not initializing due to a forcesync timeout when the IPV4 range in Ipset starts at 0.0.0.0, for example 0.0.0.0-1.1.1.20. This is caused by an issue in the IPSetFullSyncMessageProvider which becomes stuck in an infinite loop. Since the central control plane is not getting initialized, users can’t deploy new workloads.
  • Fixed Issue 2337839 – NSX-T backup widgets display incorrect field names.Specifically, the NSX-T backup widgets are not displaying the correct number of backup errors. As a result, the customer needs to review the NSX Manager backup tab to see the accurate count of backup errors.
  • Fixed Issue 2341552 – Edge fails to boot when system has too many supported NICs present.No datapath service or connectivity can be seen, the datapath service is down, and Edge node is in a degraded state. This results in partial or total connectivity loss if the edge is required.
  • Fixed Issue 2390374 – NSX Manager becomes very slow or unresponsive, and logs show many corfu exceptions.NSX may also fail to start up. The corfu exceptions indicate that the scale of Active Directory members is too large, and above the tested limits.
  • Fixed Issue 2371150 – Unable to configure Layer-7 firewall rules on Bare Metal Edge nodes. Layer-7 firewall rules on Bare Metal Edge nodes are not supported in NSX-T 2.5. There is an internal command that enables this support but this is only available for proofs of concept.
  • Fixed Issue 2361238 – Downlink router doesn’t pair with services router.NAT rules do not take affect on the downlink router after a services router which had been paired with a downlink router recreated after being deleted.
  • Fixed Issue 2363248 – Service Instance Health Status on interface appears down, though API call shows connected.This inconsistent reporting may cause a false alarm.This issue and solution are described in greater detail in Knowledge Base article 67165 – Service Instance status displays as “Down” when there are no VMs up to be protected in NSX-T.
  • Fixed Issue 2359936 – Frequent cfgAgent log rolling on ESX Host.Frequent log rollings may cause loss of useful information in cfgAgent.log for debugging and troubleshooting on host.
  • Fixed Issue 2332938 – When the SYN Cache is enabled in the Flood Protection Security Profile, the actual TCP half-open connection limit can be larger than is configured on the NSX Manager.NSX-T auto-calculates an optimal TCP half-open connection limit, based on the configured limit. This calculated limit can be greater than the configured limit and is based on the formula Limit = (PwrOf2 * Depth), where PwrOf2 is a power of 2 not less than 64, and Depth is an integer <= 32.
  • Fixed Issue 2376336 – Address Family in route redistribution not supported by Policy and Edge.Address Family in Redistribution is not working or used in the application.
  • Fixed Issue 2412842 – Limit metrics logs to 40 MB on ESX to support hosts with ramdisk.This issue is addressed in detail by Knowledge Base article 74574.
  • Fixed Issue 2385070 – IP discovery and DFW have opposite behaviors regarding IPv6 subnet.IP discovery considers 2001::1/64 as a host IP, while DFW considers it an IPv6 subnet.
  • Fixed Issue 2394896 – Host fails to upgrade from NSX-T Data Center 2.4.x to 2.5.The host fails to upgrade from NSX-T Data Center 2.4.0, 2.4.1 and 2.4.2 to 2.5. This may be due to KCP module unloading failure.This issue is discussed in greater detail in Knowledge Base article 74674.
  • Fixed Issue 2406018 –  An event/alarm is triggered if password expiry is within 30 days. An event/alarm is triggered regarding password expiration if password expiry is within 30 days and even if password expiration is disabled.
  • Fixed Issue 2383328 – Feature request to provide utility that renders metrics data into human readable form.NSX-T Data Center collects and saves metrics data in a binary format; users have requested the ability to view this data in a human-readable format. This issue tracks that request.
  • Fixed Issue 2248345: After installation of the NSX-T Edge, the machine boots up with blank black screen.Unable to install NSX-T Edge on HPE ProLiant DL380 Gen9 machine.
  • Fixed Issue 2313673 –  VM-based Edge transport nodes: users unable to connect uplinks to the NSX-T logical switches/segments.

    For VM-based Edge transport nodes, users are unable to connect the Edge transport node uplinks to the NSX-T logical switches/segments. They can connect them only to the vCenter’s DVPGs. On the Configure NSX screen for VM-based Edge transport node’s add/edit flows, the users are presented with the option to map the uplinks only with vCenter’s DVPGs. The option to map the uplinks to the NSX-T logical switches/segments is missing.

Known Issues

The known issues are grouped as follows.

General Known Issues

  • Issue 2261818 – Routes learned from eBGP neighbor are advertised back to the same neighbor.Enabling BGP debug logs will indicate packets being received back and packet getting dropped with error message. BGP process will consume additional CPU resources in discarding the update messages sent to peers. If there are large number of routes and peers this can impact route convergence.Workaround: None.
  • Issue 2390624 – Anti-affinity rule prevents service VM from vMotion when host is in maintenance mode.If a service VM is deployed in a cluster with exactly two hosts, the HA pair with anti-affinity rule will prevent the VMs from vMotioning to the other host during any maintenance mode tasks. This may prevent the host from entering Maintenance Mode automatically.Workaround: Power off the service VM on the host before the Maintenance Mode task is started on vCenter.
  • Issue 2329273 – No connectivity between VLANs bridged to the same segment by the same edge node.Bridging a segment twice on the same edge node is not supported. However, it is possible to bridge two VLANs to the same segment on two different edge nodes.Workaround: None
  • Issue 2239365 – “Unauthorized” error is thrown.This error may result because the user attempts to open multiple authentication sessions on the same browser type. As a result, login will fail with above error and cannot authenticate. Log location: /var/log/proxy/reverse-proxy.log /var/log/syslogWorkaround: Close all open authentication windows/tabs and retry authentication.
  • Issue 2252487 – Transport Node Status is not saved for BM edge transport node when multiple TN is added in parallel.The transport node status is not shown correctly in MP UI.Workaround:
    1. Reboot the proton, all transport node status can be updated correctly.
    2. Or, use the API https://<nsx-manager>/api/v1/transport-nodes/<node-id>/status?source=realtime to query the transport node status.
  • Issue 2275285 – A node makes a second request to join the same cluster before the first request is complete and the cluster stabilized. The cluster may not function properly and the CLI commands get cluster status, get cluster config could return an error.Workaround: Do not issue any new join command within 10 minutes to join the same cluster after the first join request.
  • Issue 2275388 – Loopback interface/connected interface routes could get redistributed before filters gets added to deny the routes.Unnecessary routes updates could cause the diversion on traffic for few seconds to min.Workaround: None.
  • Issue 2275708 – Unable to import a certificate with its private key when the private key has a passphrase.The message returned is, “Invalid PEM data received for certificate. (Error code: 2002)”. Unable to import a new certificate with private key.Workaround:
    1. Create a certificate with private key. Do not enter a new passphrase when prompted; press Enter instead.
    2. Select “Import Certificate” and select the certificate file and the private key file.

    Verify by opening the key-file. If a passphrase was entered when generating the key, the second line in the file will show something like “Proc-Type: 4,ENCRYPTED”.

    This line is missing if the key-file was generated without passphrase.

  • Issue 2277742 – Invoking PUT https://<nsx-manager>/api/v1/configs/management with a request body that sets publish_fqdns to true can fail if the NSX-T Manager appliance is configured with a fully qualified domain name (FQDN) instead of just a hostname.PUT https://<nsx-manager>/api/v1/configs/management cannot be invoked if a FQDN is configured.Workaround: Deploy the NSX Manager using a hostname instead of a FQDN.
  • Issue 1957072 – Uplink profile for bridge node should always use LAG for more than one uplink.When using multiple uplinks that are not formed to a LAG, the traffic is not load balanced and might not work well.Workaround: Use LAG for multiple uplinks on bridge nodes.
  • Issue 1970750 – Transport node N-VDS profile using LACP with fast timers are not applied to vSphere ESXi hosts.When an LACP uplink profile with fast rates is configured and applied to a vSphere ESXi transport node on NSX Manager, the NSX Manager shows that the profile is applied successfully, but the vSphere ESXi host is using the default LACP slow timer. On the vSphere hypervisor, you cannot see the effect of lacp-timeout value (SLOW/FAST) when the LACP NSX managed distributed switch (N-VDS) profile is used on the transport node from the NSX manager.Workaround: None.
  • Issue 2320529 – “Storage not accessible for service deployment” error thrown after adding third-party VMs for newly added datastores.“Storage not accessible for service deployment” error thrown after adding third-party VMs for newly added datastores even though the storage is accessible from all hosts in the cluster. This error state persists for up to thirty minutes.Workaround: Retry after thirty minutes. As an alternative, make the following API call to update the cache entry of datastore:
    https://<nsx-manager>/api/v1/fabric/compute-collections/<CC Ext ID>/storage-resources?uniform_cluster_access=true&source=realtime
    where <nsx-manager> is the IP address of the NSX manager where the service deployment API has failed, and CC Ext ID is the identifier in NSX of the cluster where the deployment is being attempted.
  • Issue 2328126 – Bare Metal issue: Linux OS bond interface when used in NSX uplink profile returns error.When you create a bond interface in the Linux OS and then use this interface in the NSX uplink profile, you see this error message: “Transport Node creation may fail.” This issue occurs because VMware does not support Linux OS bonding. However, VMware does support Open vSwitch (OVS) bonding for Bare Metal Server Transport Nodes.Workaround: If you encounter this issue, see Knowledge Article 67835 Bare Metal Server supports OVS bonding for Transport Node configuration in NSX-T.
  • Issue 2370555 – User can delete certain objects in the Advanced interface, but deletions are not reflected in the Simplified interface.Specifically, groups added as part of a distributed firewall exclude list can be deleted in the Advanced interface Distributed Firewall Exclusion List settings. This leads to inconsistent behavior in the interface.Workaround: Use the following procedure to resolve this issue:
    • Add an object to an exclusion list in the Simplified interface.
    • Verify that it appears displayed in the Distributed Firewall exclusion list in the Advanced interface.
    • Delete the object from the Distributed Firewall exclusion list in the Advanced interface.
    • Return to the Simplified interface and a second object to the exclusion list and apply it.
    • Verify that the new object appears in the Advanced interface.
  • Issue 2377217 – After KVM host reboot, traffic flows between VMs may not work as expected.Rebooting the KVM host may result in reachability issues between VMs.Workaround: After host reboot, restart the nsx-agent service with the following command:
    # systemctl restart nsx-agent.service
  • Issue 2371251 – Dashboard interface blinks when navigating to Backup & Restore page.This has been observed only in the Firefox browser and only in some deployments.Workaround: Manually refresh the page or use another supported browser.
  • Issue 2408453 – VMware Tools 10.3.5 crashes when NSX Guest Introspection driver is installed. VMware Tools 10.3.5 crashes irregularly on Windows VM, most noticeably when the remote session is disconnected or the guest VM is shutting down.Workaround: See Knowledge Base article 70543 for details.
  • Issue 2267964 – If vCenter is removed, user is not warned about loss of services running on vCenter.If a user removes the computer manager (vCenter) where services like Guest Introspection are deployed, the user is not notified about the potential loss of these services.Workaround: This issue can be avoided if the user follows the correct procedure for adding a new vCenter as computer manager.

Installation Known Issues

  • Issue 1957059 – Host unprep fails if host with existing vibs added to the cluster when trying to unprep.If vibs are not removed completely before adding the hosts to the cluster, the host unprep operations fails.Workaround: Make sure that vibs on the hosts are removed completely and restart the host.

NSX Manager Known Issues

  • Issue 2378970 – Cluster-level Enable/Disable setting for distributed firewall incorrectly shown as Disabled.Cluster-level Enable/Disable setting for IDFW on Simplified UI may show as Disabled even though it is Enabled on the management plane. After upgrading from 2.4.x to 2.5, this inaccuracy will persist until explicitly changed.Workaround: Manually modify the Enable/Disable setting for IDFW on Simplified UI to match the same on the management plane.

NSX Edge Known Issues

  • Issue 2283559 – https://<nsx-manager>/api/v1/routing-table and https://<nsx-manager>/api/v1/forwarding-table MP APIs return an error if the edge has 65k+ routes for RIB and 100k+ routes for FIB.If the edge has 65k+ routes for RIB and 100k+ routes for FIB, the request from MP to Edge takes more than 10 seconds and results in a timeout. This is a read-only API and has an impact only if they need to download the 65k+ routes for RIB and 100k+ routes for FIB using API/UI.Workaround: There are two options to fetch the RIB/FIB.
    • These APIs support filtering options based on network prefixes or type of route. Use these options to download the routes of interest.
    • CLI support in case the entire RIB/FIB table is needed and there is no timeout for the same.
  • Issue 2204932 – Configuring BGP Peering can delay HA failover recovery.When Dynamic-BGP-Peering is configured on the routers that peer with the T0 Edges and a failover event occurs on the Edges (active-standby mode), BGP neighborship may take up to 120 seconds.Workaround: Configure specific BGP peers to prevent the delay.
  • Issue 2285650 – BGP route tables populated with unwanted routes.When the allowas-in option is enabled as part of the BGP configuration, routes advertised by Edge nodes are received back and installed in the BGP route table. This results in excess memory consumption and routing calculation processing. If higher local preference is configured for the excess routes, this forwarding loop may result in the route table on some routers being populated with redundant routes.For example, route X originates on router D, which is adertised to routers A and B. Router C, on which allowas-in is enabled, is peered with B, so it learns route X and installs it in its route table. As a result, there are now two paths for route X to be advertised to router C, resulting in the problem.

    Workaround: You can prevent forwarding loops by configuring the problematic router (or its peer) to block routes being advertised back to it.

  • Issue 2343954 – Edge L2 bridge end point interface permits configuration of unsupported VLAN ranges.The Edge L2 Bridge and Point configuration interface permits you to configure VLAN range and multiple VLAN ranges even though these are not supported.Workaround: Do not configure such VLAN ranges for Edge L2 Bridge and Point configuration.

Logical Networking Known Issues

  • Issue 2389993 – Route map removed after redistribution rule is modified using the Policy page or API.A route map added to a redistribution rule from the Management Plane interface or API, may be removed if the same redistribution rule is subsequently modified through the Policy page interface or API. This is due to the Policy page interface or API do not support adding route-maps. This can result in advertisement of unwanted prefixes to the BGP peer.Workaround: You can restore the route map by returning the management plane interface or API to re-add it to the same rule. If you wish to include a route map in a redistribution rule, it is recommended you always use the management plane interface or API to create and modify it.
  • Issue 2275412 – Port connection doesn’t work across multiple TZs.Port connection can be used only in single TZ.Workaround: None.
  • Issue 2327904 – After using pre-created Linux bond interface as an uplink, traffic is unstable or fails.NSX-T does not support pre-created Linux bond interfaces as uplink.Workaround: For uplink, use OVS native bond configuration from uplink profile.
  • Issue 2304571 – Critical error (PSOD) may occur when running L3 traffic using VDR.Pending arp(ND) entry is not properly protected in some cases which may cause critical error (PSOD).Workaround: None.
  • Issue 2388158 – User unable to edit transit subnet settings in Tier-0 logical router configuration.After creating the Tier-0 logical router, the transit subnet configuration cannot be modified in the NSX Manager interface.Workaround: None. The best option is to delete the logical router and re-create with the desired transit subnet configuration.

Security Services Known Issues

  • Issue 2294410 – Some Application IDs are detected by the L7 firewall.The following L7 Application IDs are detected based on port, not application: SAP, SUNRPC, and SVN. The following L7 Application IDs are unsupported: AD_BKUP, SKIP, and AD_NSP.Workaround: None. There is no customer impact.
  • Issue 2395334 – (Windows) Packets wrongly dropped due to stateless firewall rule conntrack entry.Stateless firewall rules are not well supported on Windows VMs.Workaround: Add a stateful firewall rule instead.
  • Issue 2366599 – Rules for VMs with IPv6 addresses not enforced.If a VM uses an IPv6 address, but IPv6 snooping is not been enabled for that VIF via the IP discovery profile, the IPv6 address is not populated in the rule for that VM in the data path. As a result, that rule is never enforced.Workaround: Verify that the IPv6 option in IPDiscovery profile is enabled at either the VIF or logical switch whenever IPv6 addresses are used.
  • Issue 2296430 – NSX-T Manager API does not provide subject alternative names during certificate generation.NSX-T Manager API does not provide subject alternative names to issue certificates, specifically during CSR generation.Workaround: Create the CSR using an external tool that supports the extensions. After the signed certificate is received from the Certificate Authority, import it into NSX-T Manager with the key from the CSR.
  • Issue 2379632 – Multiple packets are logged when hitting Layer- 7 rule in classified stage.Multiple (2-3) packets are logged (dfwpktlogs) when hitting Layer- 7 rule in classified stage.Workaround: None.
  • Issue 2368948 – Distributed firewall rules: Realized status for individual sections may not be current.Refreshing the DFW rule view doesn’t update the realized status of individual sections in that view. As a result, the information may not be current.Workaround: This affects only manual refreshing. Polling for realized status is periodic and will provide accurate updates. Users can also refresh individual sections for accurate status.
  • Issue 2380833 – Publishing of policy draft with 8,000 or more rules requires a lot of time.A policy draft containing 8,000 or more rules can take a considerable amount of time to publish. For example, a policy draft with 8,000 rules can 25 minutes to publish.Workaround: None.
  • Issue 2424818 – Layer-2 and distributed firewall statuses not updated on NSX Manager interface.The status information produced by the logical exporter on workload VMs may not be forwarded to the management plane. As a result, the statuses displayed for these components are not correctly updated.Workaround: None. The correct status information can be accessed via CLI on the corresponding VMs.

Load Balancer Known Issues

  • Issue 2290899 – IPSec VPN does not work, control plane realization for IPSec fails.IPSec VPN (or L2VPN) fails to comes up if more than 62 LbServers are enabled along with IPSec service on Tier-0 on the same Edge node.Workaround: Reduce the number of LbServers to fewer than 62.
  • Issue 2362688 – If some pool members are DOWN in a load balancer service, the UI shows the consolidated status as UP.When a pool member is down, there is no indication on the Policy UI where the Pool status is green and Up.Workaround: None.

Solution Interoperability Known Issues

  • Issue 2289150 – PCM calls to AWS start to fail.If you update the PCG role for an AWS account on CSM from old-pcg-role to new-pcg-role, CSM updates the role for the PCG instance on AWS to new-pcg-role. However, the PCM does not know that the PCG role has been updated and as a result continues to use the old AWS clients it had created using old-pcg-role. This causes the PCM AWS cloud inventory scan and other AWS cloud calls to fail.Workaround: If you encounter this issue, do not modify/delete the old PCG role immediately after changing to new role for at least 6.5 hours. Restarting the PCG will re-initialize all AWS clients with new role credentials.
  • Issue 2401715 – Error while updating the compute manager that thumprint is invalid, even if correct thumbprint is provided.Observed when a vCenter v6.7U3 is added as compute manager in NSX-T manager. vSphere 6.7 supports changing PNID where FQDN or IP address can be changed. NSX-T 2.5 does not support this feature, hence the thumbprint issue.Workaround:  Delete the previously added vCenter and add the VC with newly changed FQDN. Adding registration may fail, as previous extension already exists on vCenter. Resolve the registration errors to get it successfully registered.

NSX Intelligence Known Issues

  • Issue 2410806 – Publishing generated recommendation fails with exception citing 500 total limitation.If the total number of members (IP addresses or VMs) in a recommended group exceeds 500, the publication of generated recommendation into a policy configuration will fail with an exception message such as “The total of IPAdressExpressions, MACAddressExpressions, paths in a PathExpression and external IDs in ExternalIDExpression should not exceed 500.”Workaround: If there are scenarios where 500-plus clients are connecting to the application VM or load balancer, you can create a rule to micro-segment access to the application load balancer, then select the application VMs to start recommendation discovery. In the alternative, you can subdivide the 500-plus member group into multiple, smaller groups.
  • Issue 2362865 – Filter by Rule Name not available for default rule. Observed in the Plan & Troubleshoot > Discover and Take Action page and affects only rules created by connectivity strategy. This issue is caused by the absence of a default policy based on the connectivity strategy specified. A default rule may be created on the management plane, but with no corresponding default policy, the user cannot filter based on that default rule. (The filter for flows visualization uses the rule name to filter by flows that hit that rule.)Workaround: Do not apply a rule name filter. Instead, check the Unprotected flag. This configuration will include flows hitting the default rule as well as any rule that has “any” source and “any” destination specified.
  • Issue 2368926 – Recommendations job fails if user reboots appliance while job is in progress.If the user reboots the NSX Intelligence appliance while a recommendations job is in progress, the job goes to a failed state. A user can start a recommendation job for a set of context VMs. The reboot deletes the context and the job fails as a result.Workaround: After reboot, repeat the recommendations job for the same set of VMs.
  • Issue 2385599 – Groups of static IPs not supported in NSX-T Intelligence recommendations.VMs and workloads that are not recognized in the NSX-T inventory, if they have intranet IP addresses, may be still be subject to recommendation as a group of static IPs, including recommendation-define rules containing these groups. However, NSX Intelligence does not support such groups and as a result, visualization shows traffic sent to them as sent to “Unknown” instead of the recommended group.Workaround: None. However, recommendation is functioning correctly. This is a display issue.
  • Issue 2374231 – For SCTP, GRE and ESP protocol flows, Service is shown as UNKNOWN and Port as 0.

    NSX Intelligence does not support source or destination port parsing for GRE, ESP, and SCTP protocol flows. NSX Intelligence provides full header parsing for TCP and UDP flows along with flow related statistics. For other supported protocols (such as GRE, ESP, and SCTP) NSX Intelligence can only provide IP information without protocol specific source or destination ports. For these protocols, the source or destination port will be zero.

    Workaround: None.

  • Issue 2374229 – NSX Intelligence appliance runs out of disk space.The NSX Intelligence appliance has a default data retention period of 30 days. If the amount of flow data is larger than anticipated amount within 30 days, the appliance might run out of disk space prematurely and become partially or completely non-operational.Workaround: This can be prevented or mitigated by monitoring the disk usage of the NSX Intelligence appliance. If disk usage is being utilized at a high rate that indicates that space might run out, you can modify so the data retention period to a fewer number of days.
    1. SSH into the NSX Intelligence appliance and access the /opt/vmware/pace/druid-config/druid_data_retention.properties file.
    2. Locate and change the correlated_flow setting to a value lower than 30 days. For example: correlated_flow=P14D
    3. Save the file and apply the changes by running the following command:
      /opt/vmware/pace/druid-config/druid-config-data-retention.sh
      NOTE: It may require up to two hours for the data to be physically deleted.
  • Issue 2389691 – Publish recommendation job fails with error “request payload size exceeds the permitted limit, max 2,000 objects are allowed per request.”If you try to publish a single recommendation job that contains more than 2,000 objects, it will fail with error “request payload size exceeds the permitted limit, max 2,000 objects are allowed per request.”Workaround: Reduce the number of objects to fewer than 2,000 in then recommendation job and retry the publication.
  • Issue 2376389 – VMs are incorrectly marked as deleted in ‘Last 24 hours’ view on mid-scale setup.After a transport node is disconnected or removed from the compute manager, NSX Intelligence shows the previous VMs as deleted, with new VMs in their place. This issue results from NSX Intelligence tracking inventory updates in the NSX database, and this behavior reflects how the inventory handles transport node disconnection from the compute manager. This does not affect the total count of live VMs in NSX Intelligence, although you may see duplicate VMs in NSX Intelligence.Workaround: No action required. Duplicate VMs are eventually removed from the interface depending on the selected time interval.
  • Issue 2393240 – Additional Flows are observed from VM to IP address.Customer sees additional flows from VM to IP-xxxx. This is due to the configuration data (Groups, VMs and services) from the NSX Policy manager reaches the NSX Intelligence appliance after the flow is created. Therefore the (earlier) flow cannot be correlated with the configuration, because it is non-existent from the flow perspective. Since the flow cannot be normally correlated, it defaults to IP-xxxx for its VM during flow lookup. After the configuration is synchronized, the actual VM flow appears.Workaround: Modify the time window to exclude the flow you do want to see.
  • Issue 2370660 – NSX Intelligence shows inconsistent data for specific VMs. This is likely caused by those VMs having the same IP address in the datacenter. This is not supported by NSX Intelligence in NSX-T 2.5.Workaround: None. Avoid assigning the same IP address to two VMs in the datacenter.
  • Issue 2372657 – VM-GROUP relationship and GROUP-GROUP flow correlation temporarily display incorrectly.VM-GROUP relationship and GROUP-GROUP flow correlation temporarily display incorrectly if the NSX Intelligence appliance is deployed while there are ongoing flows in the datacenter. Specifically, the following elements may display incorrectly during this temporary period:
    • VMs wrongly belong to Uncategorized group.
    • VMs wrongly belong to Unknown group.
    • Correlated flows between two groups can be shown wrongly.

    These errors will self-correct after the NSX Intelligence appliance has been deployed longer than the user-selected visualization period.

    Workaround: None. If the user moves out of the Visualization period during which the NSX Intelligence appliance was deployed, the issue will not appear.

  • Issue 2366630 – Delete transport node operation may fail when NSX intelligence appliance is deployed.If a transport node is being deleted while the NSX Intelligence appliance is being deployed, the deletion can fail because the transport node is referred by NSX-INTELLIGENCE-GROUP NSGroup. To delete a transport node, the force delete option is required when NSX Intelligence appliance is deployed.Workaround: Use the force option to delete the transport node.
  • Issue 2357296 – Flows may not be reported to NSX Intelligence by some ESX hosts under certain scale and stress conditions.The NSX Intelligence interface may not show flows from certain VMs on certain hosts, and fails to provide firewall rule recommendations for those VMs. As a result, firewall security could be compromised on some hosts. This is observed in deployments with vSphere versions below 6.7U2 and 6.5U3. The problem is identified as core ESX hypervisor VM filter creation and deletion out of order.Workaround: Upgrade host to version vSphere 6.7U2 and above or vSphere 6.5U3 and above.
  • Issue 2393142 – Logging in to NSX Manager with vIDM credentials may return a 403 unauthorized user error.This only affects users logging in as vIDM users, as opposed to a local user, on NSX Manager. vIDM login and integration are not supported in NSX-T 2.5 when interacting with the NSX Intelligence appliance.Workaround: Log in as a local user by appending the NSX Manager IP/FQDN with the string ‘login.jsp?local=true’.
  • Issue 2369802 – NSX Intelligence appliance backup excludes event datastore backup.This functionality is not supported in NSX 2.5.Workaround: None.
  • Issue 2346545 – NSX Intelligence appliance: certificate replacement affects new flow information reporting.If the user replaces the principal identity certificate for the NSX Intelligence appliance with a self-signed certificate, processing of new flows is affected and the appliance will not show updated information that point forward.Workaround: None.
  • Issue 2407198 – VMs incorrectly appear in Uncategorized VMs group in NSX intelligence security posture.When ESXi hosts are disconnected from vCenter, VMs in those hosts can be shown in “Uncategorized VMs” group even if they belong to other groups. When the ESXi hosts reconnected with vCenter, the VMs will appear in their correct groups.Workaround: Reconnect the hosts to vCenter.
  • Issue 2410224 – After completing NSX Intelligence appliance registration, refreshing view may return a 403 Forbidden error. After completing NSX Intelligence appliance registration, if you click Refresh to View, the system may return a 403 Forbidden error. This is a temporary condition caused by the time required for the NSX Intelligence appliance requires to access the interface.Workaround: If you receive this error, wait a few moments and try again.
  • Issue 2410096 – After rebooting the NSX Intelligence appliance, flows collected in the last 10 minutes prior to reboot may not be displayed.Caused by an indexing issue.Workaround: None.
  • Issue 2436302 – After replacing the NSX-T unified appliance cluster certificate, NSX Intelligence cannot be accessed via API or the Manager interface.In the NSX-T Manager interface, go to the Plan & Troubleshoot tab and click on Discover & Take Action or Recommendations. The interface will not load and will eventually return an error like: Failed to load requested application. Please try again or contact support if the problem persists.Workaround:
    1. Replace the NSX-T unified appliance cluster certificate.
    2. Obtain the certificate ID value from the new certificate:
      1. Go to System > Certificates and click on the ID column for the newly added certificate.
      2. Copy the certificate ID from the pop-up window.
    3. Obtain the pem_encoded field from the new certificate through the API:
      1. Use the following API GET:
        $ GET https://{{nsx_ua_server}}/api/v1/trust-management/certificates/{{certificate ID
        from previous step}}
      2. From the resulting JSON, copy the value for field pem_encoded, excluding the double quotes.
    4. Add the new certificate to the client truststore on the NSX Intelligence appliance.
      1. Using SSH, log in to the appliance VM.
        $ ssh root@nsx-pace
        $ export NEW_CERT_FILE=/root/new_cert.pem
        $ export HTTP_CERT_PWD_FILE=/config/http/.http_cert_pw
        $ export HTTP_CERT_PW='cat $HTTP_CERT_PWD_FILE'
        $ export CLIENT_TRUSTSTORE_FILE="/home/secureall/secureall/.store/.client_truststore"
      2. Paste in the new pem_encoded field from the JSON:
        $ cat > $NEW_CERT_FILE
        -----BEGIN CERTIFICATE-----
        <pem_encoded field contents>
        -----END CERTIFICATE-----
      3. Execute sed to remove new line expressions from the text string:
        $ sed 's/\\n/\
        /g' -i $NEW_CERT_FILE
      4. Insert the new cert in client truststore, using a user-defined alias:
        $ keytool -import -alias new_nsx_cluster_key -file $NEW_CERT_FILE -keystore \
        $CLIENT_TRUSTSTORE_FILE -storepass $HTTP_CERT_PW -noprompt
      5. Verify the certificate was successfully added:
        $ <new_nsx_cluster_key> keytool -list -v -keystore $CLIENT_TRUSTSTORE_FILE -storepass \
        $HTTP_CERT_PW -noprompt
    5. Restart the proxy.
      $ systemctl restart proxy

    You should now be able to refresh the Plan & Troubleshoot page and view the flow information as before.

Operations and Monitoring Services Known Issues

  • Issue 2401164 – Backups incorrectly reported as successful despite SFTP server error.If the password expires for the SFTP server used for backups, NSX-T reports the generic error “backup operation unknown error”.Workaround: Verify that the credentials for accessing the SFTP server are up to date.

Upgrade Known Issues

  • Issue 2288549 – RepoSync fails with checksum failure on manifest file.Observed in deployments recently upgraded to 2.4. When an upgraded setup is backed up and restored on a fresh deployed manager, the repository manifest checksum present in the database and the checksum of actual manifest file do not match. This causes the RepoSync to be marked as failed after backup restore.Workaround: To recover from this failure, perform the following steps:
    1. Run CLI command get service install-upgrade
      Note the IP of “Enabled on” in the results.
    2. Log in to the NSX manager IP shown in “Enabled on” return of the above command.
    3. Navigate to System > Overview, and locate the node with the same IP as “Enabled on” return.
    4. Click Resolve on that node.
    5. After the above resolve operation succeeds, click Resolve on all nodes from the same interface.
      All three nodes will now show RepoSync status as Complete.
  • Issue 2277543 – Host VIB update fails during in-place upgrade with ‘Install of offline bundle failed on host’ error.This error may occur when storage vMotion was performed on the host before doing an in-place upgrade from NSX-T 2.3.x to 2.4 and hosts running ESXi-6.5P03 (build 10884925). The switch security module from 2.3.x is not get removed if storage vMotion was performed just before the host upgrade. The storage vMotion triggers a memory leak causing the switch security module unload to fail.Workaround: See Knowledge Base article 67444 Host VIB update may fail when upgrading from NSX-T 2.3.x to NSX-T 2.4.0 if VMs are storage vMotioned before host upgrade.
  • Issue 2276398 – When an AV Partner Service VM is upgraded using NSX, there may be up to twenty minutes of protection loss.When a Partner SVM is upgraded, the new SVM is deployed and old SVM is deleted. SolutionHandler connection errors may appear on the host syslog.Workaround: Delete the ARP cache entry on the host after upgrade and then ping the Partner Control IP on the host to solve this issue.
  • Issue 2330417 – Unable to proceed with upgrade for non-upgraded transport nodes.When upgrading, the upgrade is marked as successful even though some transport nodes are not upgraded. Log location: /var/log/upgrade-coordinator/upgrade-coordinator.log.Workaround: Restart the upgrade-coordinator service.
  • Issue 2348994 – Intermittent failure during upgrade of NSX VIBs on ESXi 6.5 p03 Transport Node.Observed in some 2.4.x to 2.5 upgrades. When the NSX VIBs on an ESXi 6.5 p03 transport node are upgraded, the upgrade operation sometimes fails with the following error: “VI SDK invoke exception: Got no data from process: LANG=en_US.UTF-8”.Workaround: Upgrade to ESXi 5 p04. Alternatively, put the host in maintenance mode, and reboot it. Retry the upgrade, and exit maintenance mode.
  • Issue 2372653 – Post-upgrade to 2.5, user unable to locate LogicalPort- and LogicalSwitch-based groups in earlier NSX-T versions.After upgrading to 2.5, the LogicalPort- and LogicalSwitch-based groups created from Policy in previous NSX-T versions do not in the dashboard interface. However, they can still be located in the API. This is due to a name change caused by the upgrade process. In 2.5, LogicalPort- and LogicalSwitch-based groups appear as Segment- and SegmentPort-based groups.Workaround: Use the API only to access these Policy groups post upgrade.
  • Issue 2408972 – During upgrade, vSphere Update Manager fails while remediating last host.During upgrade, vSphere Update Manager remediation fails for the last host that has workloads back by an NSX-T logical switch.Workaround: Manually migrate all NSX-T backed workload VMs to an already upgraded host, then retry upgrade for the failed host.
  • Issue 2400379 – Context Profile page shows unsupported APP_ID error message.The Context Profile page shows the following error message: “This context profile uses an unsupported APP_ID – [<APP_ID>]. Please delete this context profile manually after making sure it is not being used in any rule.” This is caused by the post-upgrade presence of six deprecated APP_IDs (AD_BKUP, SKIP, AD_NSP, SAP, SUNRPC, SVN) that no longer work on the data path.Workaround: After ensuring that they are no longer consumed, manually delete the six APP_ID context profiles.
  • Issue 2419246 – Ubuntu KVM upgrade fails.Upgrade of Ubuntu KVM nodes may fail due to nsx-vdpi service not running. However, the nsx-vdpi service depends on the nsx-agent, but at this point in the upgrade, the nsx-agent is not yet configured. The nsx-agent fails because the vm-command-relay component is not correctly started.Workaround: Configure the incompletely installed nsx-agent. The following command reconfigures all unpacked or partially configured packages:
    dpkg --configure -a
    Or you can use the below commands to reconfigure only the nsx-agent and nsx-vdpi:
    dpkg --configure nsx-agent
    dpkg --configure nsx-vdpi

API Known Issues

  • Issue 2260435 – Stateless redirection policies/rules are created by default by API, which is not supported for east-west connections.Stateless redirection policies/rules are created by default by API, which is not supported for east-west connections. As a result, traffic is not’t get redirected to partners.Workaround: When creating redirection policies using the policy API, create a stateful section.
  • Issue 2200856 – cloud-service-manager service restart fails.Cloud-service-manager service restart can fail if the user tries it without waiting for the API service to come up for the first time.Workaround: Wait a few minutes, then retry.
  • Issue 2378752 –  API allows creation of multiple binding maps under segments or ports.Observed only on API. When a user creates multiple binding maps under a segment or port, no error is reported. The issue is seen when the user tries to bind multiple profiles on segment or port simultaneously.Workaround: Use the NSX Manager interface instead to perform this operation.

NSX Cloud Known Issues

  • Issue 2275232 – DHCP would not work for VMs on cloud if DFWs Connectivity_strategy is changed from BLACKLIST to WHITELIST.All the VMs requesting for new DHCP leases would lose IPs. Need to explicitly allow DHCP for cloud VMs in DFW.Workaround: Explicitly allow DHCP for cloud VMs in DFW.
  • Issue 2277814 – VM gets moved to vm-overlay-sg on invalid value for nsx.network tag.VM tagged with invalid nsx.network tag will get moved to vm-overlay-sg.Workaround: Remove invalid Tag.
  • Issue 2355113 – Unable to install NSX Tools on RedHat and CentOS Workload VMs with accelerated networking enabled in Microsoft Azure.In Microsoft Azure when accelerated networking is enabled on RedHat (7.4 or later) or CentOS (7.4 or later) based OS and with NSX Agent installed, the ethernet interface does not obtain an IP address.Workaround: After booting up RedHat or CentOS based VM in Microsoft Azure, install the latest Linux Integration Services driver available at https://www.microsoft.com/en-us/download/details.aspx?id=55106 before installing NSX tools.
  • Issue 2391231 – Detection of changes to Azure VMs might be delayed.Intermittently, changes to Azure VMs on the cloud are detected with a slight delay. As a result, a corresponding delay might affect onboarding the VMs and creating logical entities for the VMs in NSX-T. The maximum delay observed was approximately eight minutes.Workaround: None. After the delay period passes, the issue self-corrects.
  • Issue 2424818 – L2 and DFW stats are not updated on NSX Manager UI.All stats produced by logical exporter on workload VMs are not forwarded to MP. This causes a failure in displaying stats on the NSX Manager UI. There is no visibility of DFW statistics from the NSX Manager UI. Logical switch ports operational status will show up as DOWN and their corresponding stats will not work. This is only applicable for Cloud VMs.

New Capabilities in vSAN 6.7 Update 3

New Capabilities in vSAN 6.7 Update 3

    • Intelligent Operations
      • Get enhanced estimates of when re-builds will be complete, as well as a near-continuous refresh of the estimate. Admins can assign high-priority objects to be re-built first and receive an ETA for every object in the queue. The information can also be filtered for easy searching.
      • For upgrades that require a full data evacuation, vSAN performs a simulation to estimate success or failure so that admins can have more confidence in the upgrade process.
      • In 6.7 Update 1, vSAN established intelligent guardrails, alerting you to data availability impacts when putting a host in maintenance mode. With Update 3, we’ve expanded your ability to test the impact of maintenance mode with a new UI for more detailed checks, as well as a customizable UI for testing.
      • vSAN now handles difficult resync operations with ease. In severely capacity-strained scenarios, vSAN will take steps to mitigate resync impacts by rebalancing capacity as needed. If an issue does occur, new health checks surface alerts and proactively guide admins through a graceful recovery workflow to get the cluster back up and running quickly.
    • Proactive Rebalancing
      Admins can now enable vSAN to proactively rebalance capacity distributed across the cluster, which aids in consistent performance for workloads. Set a policy for when rebalancing needs to occur and vSAN will actively monitor and remediate the cluster as necessary.

 

    • Smart Policy Implementation
      One of the most-loved benefits of vSAN is the ability to change policies on the fly. For instance, if you need to update the fault tolerance for a particular VM or VMDK, it takes only a few clicks to update the policy, and then the software implements, monitors, and remediates it. Policy implementation is now more thoughtful, with availability in mind. In the event of large policy changes, vSAN will minimize resources used during implementation so as not to impact ongoing operations.

 

    • Increased Visibility for Storage Consumption
      View raw vs. usable capacity, see how containers are consuming storage, and get alerts as data is consumed by VMs and containers.

 

    • Workflow-based Proactive Support Activation
      Easily enable and configure Support Insight in several workflows – QuickStart, vSAN Services, online health checks – as well as on a new Customer Experience Improvement Program (CEIP) page.

 

  • Enhanced Performance
    Higher throughput and more consistent performance using all-flash clusters, as well as faster resyncs, improves SLAs. Update 3 enables more consistent performance with small I/O writes, increased throughput of sequential I/O writes when dedup and compression is enabled, and faster resyncs due to increased task parallelization.

What’s new vSAN 6.7

As most of you have seen, vSAN 6.7 just released together with vSphere 6.7. As such I figured it was time to write a “what’s new” article. There are a whole bunch of cool enhancements and new features, so let’s create a list of the new features first, and then look at them individually in more detail.

  • HTML-5 User Interface support
  • Native vRealize Operations dashboards in the HTML-5 client
  • Support for Microsoft WSFC using vSAN iSCSI
  • Fast Network Failovers
  • Optimization: Adaptive Resync
  • Optimization: Witness Traffic Separation for Stretched Clusters
  • Optimization: Preferred Site Override for Stretched Clusters
  • Optimization: Efficient Resync for Stretched Clusters
  • New Health Checks
  • Optimization: Enhanced Diagnostic Partition
  • Optimization: Efficient Decomissioning
  • Optimization: Efficient and consistent storage policies
  • 4K Native Device Support
  • FIPS 140-2 Level 1 validation

Yes, that is a relatively long list indeed. Lets take a look at each of the features. First of all, HTML-5 support. I think this is something that everyone has been waiting for. The Web Client was not the most loved user interface that VMware produced, and hopefully the HTML-5 interface will be viewed as a huge step forward. I have played with it extensively over the past 6 months and I must say that it is very snappy. I like how we not just ported over all functionality, but also looked if workflows could be improved and if presented information/data made sense in each and every screen. This also however does mean that new functionality from now on will only be available in the HTML-5 client, so use this going forward. Unless of course the functionality you are trying to access isn’t available yet, but most of it should be! For those who haven’t seen  it yet, here’s  a couple of screenshots… ain’t it pretty? 😉

For those who didn’t notice, but in the above screenshot you actually can see the swap file, and the policy associated with the swap file, which is a nice improvement!

The next feature is native vROps dashboards for vSAN in the H5 client. I found this very useful in particular. I don’t like context switching and this feature allows me to see all of the data I need to do my job in a single user interface. No need to switch to the VROps UI, but instead vSphere and vSAN dashboards are now made available in the H5 client. Note that it needs the VROps Client Plugin for the vCenter H5 UI to be installed, but that is fairly straight forward.

Next up is support for Microsoft Windows Server Failover Clustering  for the vSAN iSCSI service. This is very useful for those running a Microsoft cluster. Create and iSCSI Target and expose it to the WSFC virtual machines. (Normally people used RDMs for this.) Of course this is also supported with physical machines. Such a small enhancement, but for customers using Microsoft clustering a big thing, as it now allows you to run those clusters on vSAN without any issues.

Next are a whole bunch of enhancements that have been added based on customer feedback of the past 6-12 months. Fast Network Failovers was one of those. Majority of our customers have a single vmkernel interface with multiple NICs associated with them, some of our customers have a setup where they create two vmkernel interfaces on different subnets, each with a single NIC. What that last group of customers noticed is that in the previous release we waited 90 seconds before failing over to the other vmkernel interface (tcp time out) when a network/interface had failed. In the 6.7 release we actually introduce a mechanism that allows us to failover fast, literally within seconds. So a big improvement for customers who have this kind of network configuration (which is very similar to the traditional A/B Storage Fabric design).

Adaptive Resync is an optimization to the current resync function that is part of vSAN. If a failure has occurred (host, disk, flash failure) then data will need to be resynced to ensure that the impacted objects (VMs, disks etc) are brought in to compliance again with the configured policy. Over the past 12 months the engineering team has worked hard to optimize the resync mechanism as much as possible. In vSAN 6.6.1 a big jump was already made by taking VM latency in to account when it came to resync bandwidth allocation, and this has been further enhanced in 6.7. In 6.7 vSAN can calculate the total available bandwidth, and ensures Quality Of Service for the guest VMs prevails by allocating those VMs 80% of the available bandwidth and limiting the resync traffic to 20%. Of course, this only applies when congestion is detected. Expect more enhancements in this space in the future.

A couple of release ago we introduced Witness Traffic Separation for 2 Node configurations, and in 6.7 we introduce the support for this feature for Stretched Clusters as well. This is something many Stretched vSAN customers have asked for. It can be configured through the CLI only at this point (esxcli) but that shouldn’t be a huge problem. As mentioned previously, what you end up doing is tagging a vmknic for “witness traffic” only. Pretty straight forward, but very useful:

esxcli vsan network ip set -i vmk<X> -T=witness

Another enhancement for stretched clusters is Preferred Site Override. It is a small enhancements, but in the past when the preferred site failed and returned for duty but would only be connected to the witness, it could happen that the witness would bind itself directly to the preferred site. This by itself would result in VMs becoming unavailable. This Preferred Site Override functionality would prevent this from happening. It will ensure that VMs (and all data) remains available in the secondary site. I guess one could also argue that this is not an enhancement, but much more a bug fix. And then there is the Efficient Resync for Stretched Clusters feature. This is getting a bit too much in to the weeds, but essentially it is a smarter way of bringing components up to the same level within a site after the network between locations has failed. As you can imagine 1 location is allowed to progress, which means that the other location needs to catch up when the network returns. With this enhancement we limit the bandwidth / resync traffic.

And as with every new release, the 6.7 release of course also has a whole new set of Health Checks. I think the Health Check has quickly become the favorite feature of all vSAN Admins, and for a good reason. It makes life much easier if you ask me. In the 6.7 release for instance we will validate consistency in terms of host settings and if an inconsistency is found report this. We also, when downloading the HCL details, will only download the differences between the current and previous version. (Where in the past we would simply pull the full json file.) There are many other small improvements around performance etc. Just give it a spin and you will see.

Something that my team has been pushing hard for (thanks Paudie) is the Enhanced Diagnostic Partition. As most of you know when you install / run ESXi there’s a diagnostic partition. This diagnostic partition unfortunately was a fixed size, with the current release when upgrading (or installing green field) ESXi will automatically resize the diagnostic partition. This is especially useful for large memory host configurations, actually useful for vSAN in general. No longer do you need to run a script to resize the partition, it will happen automatically for you!

Another optimization that was released in vSAN 6.7 is called “Efficient Decomissioning“. And this is all about being smarter in terms of consolidating replicas across hosts/fault domains to free up a host/fault domain to allow for maintenance mode to occur. This means that if a component is striped, for other reasons then policy, they may be consolidated. And the last optimization is what they refer to as Efficient and consistent storage policies. I am not sure I understand the name, as this is all about the swap object. Per vSAN 6.7 it will be thin provisioned by default (instead of 100% reserved), and also the swap object will now inherit the policy assigned to the VM. So if you have FTT=2 assigned to the VM, then you will have not two but three components for the swap object, still thin provisioned so it shouldn’t really change the consumed space in most cases.

Then there are the two last items on the list: 4K Native Device Support and FIPS 140-2 Level 1 validation. I think those speak for itself. 4K Native Device Support has been asked for by many customers, but we had to wait for vSphere to support it. vSphere supports it as of 6.7, so that means vSAN will also support it Day 0. The ​VMware VMkernel Cryptographic Module v1.0 has achieved FIPS 140-2, vSAN leverages the same module for vSAN Encryption. Nice collaboration by the teams, which is now showing the big benefit.

Upgrading vCenter Server Appliance VCSA with External PSC from 6.0 U3 to 6.5 U1 then to 6.5 U2

we will take about upgrading vCenter Server Appliance with External PSC from 6.0 u3b to 6.5 U1e we done this upgrade before the release of 6.5 U2  which was released few day’s before Release note so we upgrade to 6.5 U2

before we upgrade to 6.5 you must be prepare the for that :

Source System Prerequisites
• Verify that the appliance that you want to upgrade does not run on an ESXi host that is part of a fully automated DRS cluster.
• Verify that port 22 is open on the appliance that you want to upgrade. The upgrade process establishes an inbound SSH connection to download the exported data from source appliance.
• If you are upgrading a vCenter Server Appliance that is configured with Update Manager, run the Migration Assistant on the source Update Manager machine.
• Verify that port 443 is open on the source ESXi host
• Create a snapshot of the appliance that you want to upgrade as a precaution in case of failure during the upgrade process.
• If you use an external database, back up the vCenter Server Appliance database

Target System Prerequisites
• If you plan to deploy the new appliance on an ESXi host, verify that the target ESXi host is not part of a fully automated DRS cluster.
• If you plan to deploy the new appliance on a DRS cluster of the inventory of a vCenter Server instance, verify that the cluster is not fully automated.

for this case we have 2 platform service controller linked with LB and 2 vCenter linked to each other

first step to upgrade the platform service controller PSC and it’s partner

upgrade will take place in two stage

stage 1 : will create a new PSC VM with temp IP address

2018-05-09_9-23-402018-05-09_9-30-552018-05-09_9-39-412018-05-09_9-40-592018-05-09_9-43-36

2018-05-08_10-18-292018-05-08_10-19-182018-05-08_10-23-53

after the end of stage 1 , stage 2 will start  in which the new PSC VM will copy all configuration from the old PSC VM and take the same DNS name and IP Address then shutdown the old one

2018-05-09_9-48-262018-05-09_9-46-512018-05-09_9-49-282018-05-09_9-51-12

after this step compete you can check from the VAMI interface.  To do that, just navigate to https:// <Platform service controller address>:5480 and login with your root account

2018-05-08_12-35-26

after upgrading PSC and it’s partner we now read to start vCenters upgrade  concurrent

before upgrade vCenter we need to run the migration assistance on update manager  VM .

2018-05-08_10-51-502018-05-08_10-54-41

now we ready to start vCenter upgrade which also will be in 2 stage

1 stage creating a new vCenter VM

2018-05-09_10-07-40

Stage 2 will transfer data from old Center to the New vCenter including DNS name IP Address ,configuration and update manager

2018-05-09_10-04-402018-05-08_11-24-412018-05-08_11-25-412018-05-09_10-06-352018-05-09_10-11-002018-05-08_11-39-522018-05-08_11-51-42

you can check from the VAMI interface.  To do that, just navigate to https:// <vCenter Server Appliance>:5480 and login with your root account

2018-05-09_11-32-50

so we now successfully upgraded plate form service controller and the vCenter to 6.5U1e

now from the VAMI interface we will check update and install the latest update which is 6.5U2

we will follow the same sequence first update PSC and it’s partner then vCenter concurrent

first PSC upgrade

2018-05-08_12-42-48

2018-05-08_12-23-36

2018-05-08_12-34-33

after packages upgraded we need to reboot

2018-05-09_10-21-38

 

2018-05-09_10-21-11

now vCenter

2018-05-08_13-26-10

2018-05-08_13-38-45

2018-05-08_12-34-33

2018-05-09_10-21-38

2018-05-09_10-21-11

at the end in this blog post we show how to upgrade vCenter Server appliance with External Platform Service Controller from 6.0U3b to 6.5U1e and then to 6.5U2

you can upgrade direct from 6.0U3b to 6.5U2 with same steps as above

Adding Host to vCenter giving rpc_s_auth_method error

We have vCenter 6.5 U2 when i try to add ESXI host 6.0 U3 i have got the below error message

Add standalone host A general system error occurred: Unable to get signed certificate for host: Error: Access denied, reason = rpc_s_auth_method (0x16c9a0f6). (382312694).

 

To Solve this error Follow Below Steps :

Connect to the vCenter Server using vSphere Client and an Administrative account.

Go to Configure > vCenter Server settings > Advanced Settings.

Change the value of vpxd.certmgmt.mode to thumbprint and click OK.

2019-03-03_12-03-51

Add the ESXi host again.

ESXTOP

I am a huge fan of esxtop! I used to read a couple of pages of the esxtop bible every day before I went to bed, not anymore as the doc is unfortunately outdated (yes I have requested an update various times.). Something I, however, am always struggling with is the “thresholds” of specific metrics. I fully understand that it is not black/white, performance is the perception of a user in the end.

There must be a certain threshold however. For instance it must be safe to say that when %RDY constantly exceeds the value of 20 it is very likely that the VM responds sluggish. I want to use this article to “define” these thresholds, but I need your help. There are many people reading these articles, together we must know at least a dozen metrics lets collect and document them with possible causes if known.

Please keep in mind that these should only be used as a guideline when doing performance troubleshooting! Also be aware that some metrics are not part of the default view. You can add fields to an esxtop view by clicking “f” on followed by the corresponding character.

I used VMworld presentations, VMware whitepapers, VMware documentation, VMTN Topics and of course my own experience as a source and these are the metrics and thresholds I came up with so far. Please comment and help build the main source for esxtop thresholds.

vSphere 6.5

For vSphere 6.5 there are various different metrics added. For instance, on the “power management” section there is %A/MPERF added which indicates if Turbo Boost is being used. On CPU also in 6.5 the %RUN will be lower as the system thread which is doing work for a particular VM will now be “charged” to %RUN instead of %SYS. That way %RUN actually represents everything and is more intuitive.

Metrics and Thresholds

Display Metric Threshold Explanation
CPU %RDY 10 Overprovisioning of vCPUs, excessive usage of vSMP or a limit(check %MLMTD) has been set. Note that you will need to expand the VM Group to see how this is distributed across vCPUs. If you have many vCPUs than per vCPU may be low and this may not be an issue. 10% is per world!
CPU %CSTP 3 Excessive usage of vSMP. Decrease amount of vCPUs for this particular VM. This should lead to increased scheduling opportunities.
CPU %MLMTD 0 The percentage of time the vCPU was ready to run but deliberately wasn’t scheduled because that would violate the “CPU limit” settings. If larger than 0 the world is being throttled due to the limit on CPU.
CPU %SWPWT 5 VM waiting on swapped pages to be read from disk. Possible cause: Memory overcommitment.
MEM MCTLSZ 1 If larger than 0 hosts is forcing VMs to inflate balloon driver to reclaim memory as host is overcommited.
MEM SWCUR 1 If larger than 0 hosts has swapped memory pages in the past. Possible cause: Overcommitment.
MEM SWR/s 1 If larger than 0 host is actively reading from swap(vswp). Possible cause: Excessive memory overcommitment.
MEM SWW/s 1 If larger than 0 host is actively writing to swap(vswp). Possible cause: Excessive memory overcommitment.
MEM CACHEUSD 0 If larger than 0 hosts has compressed memory. Possible cause: Memory overcommitment.
MEM ZIP/s 0 If larger than 0 hosts is actively compressing memory. Possible cause: Memory overcommitment.
MEM UNZIP/s 0 If larger than 0 host has accessing compressed memory. Possible cause: Previously host was overcommited on memory.
MEM N%L 80 If less than 80 VM experiences poor NUMA locality. If a VM has a memory size greater than the amount of memory local to each processor, the ESX scheduler does not attempt to use NUMA optimizations for that VM and “remotely” uses memory via “interconnect”. Check “GST_ND(X)” to find out which NUMA nodes are used.
NETWORK %DRPTX 1 Dropped packets transmitted, hardware overworked. Possible cause: very high network utilization
NETWORK %DRPRX 1 Dropped packets received, hardware overworked. Possible cause: very high network utilization
DISK GAVG 25 Look at “DAVG” and “KAVG” as the sum of both is GAVG.
DISK DAVG 25 Disk latency most likely to be caused by the array.
DISK KAVG 2 Disk latency caused by the VMkernel, high KAVG usually means queuing. This is the ESXi storage stack, the vSCSI layer and the VMM. Check “QUED”.
DISK QUED 1 Queue maxed out. Possibly queue depth set to low, or controller overloaded. Check with array vendor for optimal queue depth value. (Enable this via option “F” aka QSTATS
DISK ABRTS/s 1 Aborts issued by guest(VM) because storage is not responding. For Windows VMs this happens after 60 seconds by default. Can be caused for instance when paths failed or array is not accepting any IO for whatever reason.
DISK RESETS/s 1 The number of commands resets per second.
DISK ATSF 1 The number of failed ATS commands, this value should be 0
DISK ATS 1 The number of successful ATS commands, this value should go up over time when the array supports ATS
DISK DELETE 1 The number of successful UNMAP commands, this value should go up over time when the array supports UNMAP!
DISK DELETE_F 1 The number of failed UNMAP commands, this value should be 0
DISK CONS/s 20 SCSI Reservation Conflicts per second. If many SCSI Reservation Conflicts occur performance could be degraded due to the lock on the VMFS.
VSAN SDLAT 5 Standard deviation of latency, when above 10ms latency contact support to analyze vSAN Observer details to find out what is causing the delay

Running esxtop

Although understanding all the metrics esxtop provides seem to be impossible using esxtop is fairly simple. When you get the hang of it you will notice yourself staring at the metrics/thresholds more often than ever. The following keys are the ones I use the most.

Open console session or ssh to ESX(i) and type:

esxtop

By default the screen will be refreshed every 5 seconds, change this by typing:

s 2

Changing views is easy, type the following keys for the associated views:

c = cpu
m = memory
n = network
i = interrupts
d = disk adapter
u = disk device
v = disk VM
p = power mgmt
x = vsan

V = only show virtual machine worlds
e = Expand/Rollup CPU statistics, show details of all worlds associated with group (GID)
k = kill world, for tech support purposes only!
l = limit display to a single group (GID), enables you to focus on one VM
# = limiting the number of entitites, for instance the top 5

2 = highlight a row, moving down
8 = highlight a row, moving up
4 = remove selected row from view
e = statistics broken down per world
6 = statistics broken down per world

Add/Remove fields:

f
<type appropriate character>

Changing the order:

o
<move field by typing appropriate character uppercase = left, lowercase = right>

Saving all the settings you’ve changed:

W

Keep in mind that when you don’t change the file-name it will be saved and used as default settings.
Help:

?

In very large environments esxtop can high CPU utilization due to the amount of data that will need to be gathered and calculations that will need to be done. If CPU appears to highly utilized due to the number of entities (VMs / LUNs etc) a command line option can be used which locks specific entities and keeps esxtop from gathering specific info to limit the amount of CPU power needed:

esxtop -l

More info about this command line option can be found here.

Capturing esxtop results

First things first. Make sure you only capture relevant info. Ditch the metrics you don’t need. In other words, run esxtop and remove/add(f) the fields you don’t actually need or do need! When you are finished make sure to write(W) the configuration to disk. You can either write it to the default config file(esxtop4rc) or write the configuration to a new file.

Now that you have configured esxtop as needed run it in batch mode and save the results to a .csv file:

esxtop -b -d 2 -n 100 > esxtopcapture.csv

Where “-b” stands for batch mode, “-d 2” is a delay of 2 seconds and “-n 100” is 100 iterations. In this specific case, esxtop will log all metrics for 200 seconds. If you want to record all metrics make sure to add “-a” to your string.

Or what about directly zipping the output as well? These .csv can grow fast and by zipping it a lot of precious disk space can be saved!

esxtop -b -a -d 2 -n 100 | gzip -9c > esxtopoutput.csv.gz

Please note that when a new VM is powered on, a VM is vMotion to the host or a new world is created it will not show up within esxtop when “-b” is used as the entities are locked! This behavior is similar to starting esxtop with “-l”.

Analyzing results

You can use multiple tools to analyze the captured data.

  1. VisualEsxtop
  2. perfmon
  3. excel
  4. esxplot

What is VisualEsxtop as it is a relatively new tool (published 1st of July 2013).

VisualEsxtop is an enhanced version of resxtop and esxtop. VisualEsxtop can connect to VMware vCenter Server or ESX hosts, and display ESX server stats with a better user interface and more advanced features.

That sounds nice right? Let us have a look how it works, this is what I did to get it up and running:

  • Go to “http://labs.vmware.com/flings/visualesxtop” and click “download”
  • Unzip “VisualEsxtop.zip” into a folder you want to store the tool
  • Go to the folder
  • Double click “visualesxtop.bat” when running Windows (Or follow William’s tip for the Mac)
  • Click “File” and “Connect to Live Server”
  • Enter the “Hostname”, “Username” and “Password” and hit “Connect”
  • That is it…

Now some simple tips:

  • By default, the refresh interval is set to 5 seconds. You can change this by hitting “Configuration” and then “Change Interval”
  • You can also load Batch Output, this might come in handy when you are a consultant for instance and a customer sends you captured data, you can do this under: File -> Load Batch Output
  • You can filter output, very useful if you are looking for info on a specific virtual machine/world! See the filter section.
  • When you click “Charts”  and double click “Object Types” you will see a list of metrics that you can create a chart with. Just unfold the ones you need and double click them to add them to the right pane

There are a bunch of other cool features in their like color-coding of important metrics for instance. Also, the fact that you can show multiple windows at the same time is useful if you ask me and of course the tooltips that provide a description of the counter! If you ask me, a tool everyone should download and check out.

Let’s continue with my second favorite tool, perfmon. I’ve used perfmon(part of Windows also known as “Performance Monitor”) multiple times and it’s probably the easiest as many people are already familiar with it. You can import a CSV as follows:

  1. Run: perfmon
  2. Right click on the graph and select “Properties”.
  3. Select the “Source” tab.
  4. Select the “Log files:” radio button from the “Data source” section.
  5. Click the “Add” button.
  6. Select the CSV file created by esxtop and click “OK”.
  7. Click the “Apply” button.
  8. Optionally: reduce the range of time over which the data will be displayed by using the sliders under the “Time Range” button.
  9. Select the “Data” tab.
  10. Remove all Counters.
  11. Click “Add” and select appropriate counters.
  12. Click “OK”.
  13. Click “OK”.

The result of the above would be:
Imported ESXTOP data

With MS Excel it is also possible to import the data as a CSV. Keep in mind though that the amount of captured data is insane so you might want to limit it by first importing it into perfmon and then select the correct timeframe and counters and export this to a CSV. When you have done so you can import the CSV as follows:

  1. Run: excel
  2. Click on “Data”
  3. Click “Import External Data” and click “Import Data”
  4. Select “Text files” as “Files of Type”
  5. Select File and click “Open”
  6. Make sure “Delimited” is selected and click “Next”
  7. Deselect “Tab” and select “Comma”
  8. Click “Next” and “Finish”

All data should be imported and can be shaped / modelled / diagrammed as needed.

Another option is to use a tool called “esxplot“. It hasn’t been updated in a while, and I am not sure what the state of the tool is. You can download the latest version here though, but personally I would recommend using VisualEsxtop instead of esxplot, just because it is more recent.

  1. Run: esxplot
  2. Click File -> Import -> Dataset
  3. Select file and click “Open”
  4. Double click hostname and click on metric

Using ESXPLOT for ESXTOP data
As you can clearly see in the screenshot above the legend(right of the graph) is too long. You can modify that as follows:

  1. Click on “File” -> preferences
  2. Select “Abbreviated legends”
  3. Enter appropriate value

For those using a Mac, esxplot uses specific libraries which are only available on the 32Bit version of Python. In order for esxplot to function correctly set the following environment variable:

export VERSIONER_PYTHON_PREFER_32_BIT=yes

Limiting your view

In environments with a very high consolidation ratio (high number of VMs per host), it could occur that the VM you need to have performance counters for isn’t shown on your screen. This happens purely due to the fact that height of the screen is limited in what it can display. Unfortunately, there is currently no command line option for esxtop to specify specific VMs that need to be displayed. However, you can export the current list of worlds and import it again to limit the amount of VMs shown.

esxtop -export-entity filename

Now you should be able to edit your file and comment out specific worlds that are not needed to be displayed.

esxtop -import-entity filename

I figured that there should be a way to get the info through the command line as and this is what I came up with. Please note that <virtualmachinename> needs to be replaced with the name of the virtual machine that you need the GID for.

VMWID=`vm-support -x | grep <virtualmachinename> |awk '{gsub("wid=", "");print $1}'`
VMXCARTEL=`vsish -e cat /vm/$VMWID/vmxCartelID`
vsish -e cat /sched/memClients/$VMXCARTEL/SchedGroupID

Now you can use the outcome within esxtop to limit(l) your view to that single GID. William Lam has written an article a couple of days after I added the GID section. The following is a lot simpler than what I came up with,

VMware NSX-V vs NSX-T Comparison

The software-defined data center is made possible by virtualization the key components and functionalities of the datacenter. This started of course with virsualizing compute with the initial virtualization wave.

Next, was the virtualization of the network components. VMware’s NSX-V platform has made tremendous waves in the software-defined data center and has allowed organizations to be truly freed from the underlying hardware network components for data center communication.

VMware’s NSX product has certainly matured over the last several releases with the latest release by the VMware being, NSX-T 2.1. In this blog, we will walk through

  1. How is NSX-“T” compared to NSX-“V”?
  2. What are the use cases for each version of NSX? and,
  3. Architecture changes between NSX-V and NSX-T

VMware NSX-V vs NSX-T Comparison

To understand the use case for NSX-T let’s think about the requirements for NSX-V which would help us to see where NSX-T fits into the VMware SDN ecosystem.

Requirements of NSX-V

NSX-V (NSX for “vSphere”) is designed for vSphere deployments only and is architected so that a single NSX-V manager platform is tied to a single VMware vCenter Server instance. The NSX-V platform is the original NSX platform that has been around for a few years now.

It is specifically designed with VMware virtual machines in mind as that is the legacy virtualization mechanism for workloads that has been around since the onset of server virtualization.

With NSX-V, organizations are able to mobilize network connectivity between virtual machines and allow those workloads to be connected in ways that were otherwise unable to be delivered efficiently by physical networking hardware.

For the most part, if you are wanting to run a software-defined networking infrastructure within the realm of VMware vSphere, NSX-V is the platform that you are most likely to be using.

What is NSX-T and what use case does it serve?

NSX-T (NSX “Transformers”) is designed to address many of the use cases that NSX-V was not designed for, such as the multi-hypervisors. NSX-T is a multi-hypervisor aware SDN stack brought to the likes of vSphere, KVM, OpenStack, Kubernetes, and Docker.

It is designed to address emerging application frameworks and architectures that have heterogeneous endpoints and technology stacks. One of the major use cases for NSX-T is with containers. In today’s virtualization, we are seeing more and more applications are running in environments outside of virtual machines.

Important as well when considering the multi-hypervisor support is the fact that NSX-T has been decoupled from VMware vCenter Server. NSX-T is a standalone solution for vCenter and vSphere environments but it can also support KVM, public cloud, containers, and can also be integrated into frameworks like Red Hat OpenShift, Pivotal, and others.

One of the major shifts in focus you will see when comparing the two products is that NSX-T is more cloud focused with forward looking functionality.
It also allows organizations more flexibility in choosing the solution that best fits their use case whether that is including hypervisors, containers, bare metal, and public clouds.

VMware NSX-T is integrated with the VMware Photon Platform which is the cloud centric operating system that VMware developed from the ground up with the likes of the current vCenter server running atop this platform. NSX-T also contains the NSX-T Container Networking interface (CNI) plugin that will allow developers to configure network connectivity for container applications that help deliver Infrastructure as a Service

Architecture Changes

Interestingly, with NSX-T VMware has moved over from the VXLAN based encapsulation that is utilized by NSX-V, and has adopted the newer “Geneve” encapsulation. This architectural difference makes NSX-T and NSX-V incompatible at the moment.

What is the Geneve encapsulation standard as opposed to the more prevalent VXLAN, especially when there are a lot of hardware devices on the market that supports VXLAN?

Geneve is a newly minted encapsulation co-authored by VMware, Microsoft, Red Hat and Intel. Geneve combines the best of the current encapsulation protocols such as VXLAN, STT, and NVGRE into a single protocol. Much has been learned from current network virtualization protocols and as NSX has matured, the need for an even more extensible encapsulation protocol has come to light. Geneve allows inserting metadata as TLV fields which can be used for new features as needed.

NSX-V-vs-NSX-TOther NSX-T Architecture changes to note:

  • Decoupled from vCenter
  • NSX-T Manager and NSX-T controllers can be deployed as VMs on either ESXi or KVM
  • There is a new “hostswitch” that is utilized for multi-hypervisor support. This is a variant of VMware vSwitch and Open Virtual Switch for KVM
  • Utilizes Geneve encapsulation – MTU of 1600 is still recommended for the encapsulation header
  • Routing changes – NSX-T utilizes next-generation optimized routing that is multi-tiered with logical separation between the provider router (Tier0 router) and the tenant router function (Tier1 router)
  • Standard HTML5 interface for configuration and management

Thoughts

VMware NSX is certainly evolving, especially with the introduction of VMware NSX-T. VMware is showing commitment to the NSX platform moving beyond simply vSphere environments and including KVM, Openstack, and multiple public clouds. This decoupling from vSphere will certainly attract others to VMware’s ever popular network virtualization platform. There are several key points to note with NSX-T including the following.

Key Points of Interest with NSX-T

  • Designed to work with multiple hypervisors (ESXi and KVM currently)
  • Does not rely on vCenter
  • Tiered routing
  • HTML5 interface is standard
  • Openstack plugin allows developers to build and interact with Infrastructure as Code
  • Geneve encapsulation protocol

It will be interesting to see how VMware handles the two product lines between NSX-V and NSX-T and if the two products will remain separate or VMware will attempt to bring both together at some point in the future.

VMware NSX 6.4.1 Released New Features With vSphere 6.7 Support

One of the downsides of a new version of vSphere is that customers have to wait until ALL the integrations and products that interact with vSphere itself are compatible with new versions as they are released. This generally takes time unfortunately. However, one of the roadblocks to upgrading to vSphere 6.7 has been removed with the introduction of NSX 6.4.1 which now officially supports vSphere 6.7!  However, the vSphere 6.7 support is only one of the many new features with NSX 6.4.1.  One of the major features is the addition of much more functionality inside the HTML5 interface for NSX 6.4.1.  Let’s take a look at VMware NSX 6.4.1 Released New Features With vSphere 6.7 Support.

VMware NSX 6.4.1 Released New Features With vSphere 6.7 Support

So this is another really great release from the VMware NSX team.  It includes many new features that have been extended from the NSX 6.4 release such as the context aware firewall and identity firewall features.

  • Additional Layer 7 Application Context Support: SYMUPD (Symantec LiveUpdate traffic, which includes spyware definitions, firewall rules, antivirus signature files, and software updates), MAXDB (SQL connections and queries made to a MaxDB SQL server), and GITHUB (web-based Git or version control repository and Internet hosting service).
  • Expanded OS support for Identity Firewall:  Identity Firewall was one of the great new features of the 6.4 release that allowed filtering individual, unique, TCP/IP streams from clients connecting to RDSH servers.  With 6.4, this functionality was only supported with Windows Server 2016.  However, running the latest VMware Tools with 6.4.1 on Windows Server OS’es extends the supported platforms including Windows Server 2012 with VMware Tools 10.2.5 and Windows 2012 R2 with VMware Tools 10.2.5.

VMware-NSX-Manager-6.4.1 VMware NSX 6.4.1 Released New Features With vSphere 6.7 Support

Perhaps some of the more exciting features even are the enhancements to the HTML5 interface, extending what can be done via the brand new ultra fast HTML5 UI.  This includes the following:

  • Installation
  • Groups and Tags
  • Firewall
  • Service Composer
  • Application Rule Manager
  • SpoofGuard
  • IPFIX
  • Flow Monitoring

New-HTML5-functioanlity-with-VMware-NSX-6.4.1 VMware NSX 6.4.1 Released New Features With vSphere 6.7 Support

VMware is certainly making a strong push to have the HTML5 client fully functional by the Fall of 2018 and they are giving everyone confidence that this will be accomplished with the blistering pace they are adding functionality to both vSphere core products and the ancillary products that extend the core vSphere feature set.  There is still work to be done.  Below is the table found on the Functionality Updates for VMware NSX for vSphere that shows functionality that still not supported as of yet with the HTML5 client.

Functional Area Unsupported Functionality in the vSphere Client
Service Definitions All functionality, including:
Guest Introspection Services
Network Introspection Services
Hardware Devices
Logical Switches All functionality
NSX Edges All functionality, including:
Edge – lifecycle management
Edge – Routing
Edge – NAT
Edge – DHCP
Edge – Bridging
Edge – Firewall
Edge – Load Balancer
Edge – VPN
Service Composer Service Composer Canvas
Tools: Endpoint Monitoring All functionality
Tools: Flow Monitoring Flow Monitoring Dashboard
Details by Service
Configuration
Tools: Traceflow All functionality
System: Users and Domains All functionality
System: Events All functionality, including:
SNMP Events
NSX Ticket Logger
Cross-VC NSX Universal Logical Switch
Universal Logical Router
NSX Home Getting Started
License Information
Customer Experience Improvement Program

The other big feature with VMware NSX 6.4.1 is of course the compatibility with vSphere 6.7.  Again, many were perhaps holding out on upgrading to vSphere 6.7 because of VMware NSX.  This can now be removed as a roadblock and can allow organizations to get the latest functionality across the board when upgrading both vSphere and NSX.  We are still waiting however on supportability of vSphere 6.7 from the major backup vendors, but hopefully these announcements will be made soon!

It certainly is cool to see the NSX functionality being brought into the new HTML5 interface with the speed and efficiency of operations being greatly improved with the performance of the new interface.  It is great to be able to install and configure NSX 6.4.1 in vSphere 6.7 using the new workflows of HTML5.

Installing-NSX-6.4.1-in-vSphere-6.7 VMware NSX 6.4.1 Released New Features With vSphere 6.7 Support

General compatibility with VMware NSX 6.4.1 is as follows.  Note there is no compatibility documented as of yet for vSphere 6.5 Update 2.  Additionally a point to note is organizations need to be upgrading from vSphere 5.5 as soon as possible.

  • For vSphere 6.0:
    Supported: 6.0 Update 2, 6.0 Update 3
    Recommended: 6.0 Update 3. vSphere 6.0 Update 3 resolves the issue of duplicate VTEPs in ESXi hosts after rebooting vCenter server. SeeVMware Knowledge Base article 2144605 for more information.
  • For vSphere 6.5:
    Supported: 6.5a, 6.5 Update 1
    Recommended: 6.5 Update 1. vSphere 6.5 Update 1 resolves the issue of EAM failing with OutOfMemory. See VMware Knowledge Base Article 2135378 for more information.
  • For vSphere 6.7
    Supported: 6.7
    Recommended: 6.7

Note: vSphere 5.5 is not supported with NSX 6.4.

Official VMware vSphere 6.7 Release Notes:  https://docs.vmware.com/en/VMware-NSX-for-vSphere/6.4/rn/releasenotes_nsx_vsphere_641.html#upgradenotes

Takeaways

The news of VMware NSX 6.4.1 Released New Features With vSphere 6.7 Support was certainly music to the ears of many vSphere admins last week.  It is awesome to see the progress being made with migrating to the HTML5 interface and how VMware has revisited workflows and such with the interface and is not simply just porting things over.  The support for vSphere 6.7 has removed a major roadblock for organizations who are running NSX not being able to upgrade until that compatibility was released, and now it has been.  Look for more NSX 6.4.1 coverage here with a series of articles to come.  Stay tuned!

VMware NSX for vSphere 6.4.1 Release Notes

What’s New in NSX 6.4.1

NSX for vSphere 6.4.1 adds usability and serviceability enhancements, and addresses a number of specific customer bugs. See Resolved Issues for more information.

Changes introduced in NSX for vSphere 6.4.1:

Security Services

  • Context-Aware Firewall:
    • Additional Layer 7 Application Context Support: SYMUPD (Symantec LiveUpdate traffic, which includes spyware definitions, firewall rules, antivirus signature files, and software updates), MAXDB (SQL connections and queries made to a MaxDB SQL server), and GITHUB (web-based Git or version control repository and Internet hosting service).
    • Expanded OS support for Identity Firewall: Identity Firewall support for user sessions on remote desktop and application servers (RDSH) is now expanded to include Windows Server 2012 with VMware Tools 10.2.5 and Windows 2012 R2 with VMware Tools 10.2.5.

NSX User Interface

  • VMware NSX – Functionality Updates for vSphere Client (HTML):
  • Firewall – UI Enhancements:
    • Improved visibility: status summary, action toolbar, view of group membership details from firewall table
    • Efficient rule creation: in-line editing, clone rules, multi-selection and bulk action support, simplified rule configuration
    • Efficient section management: drag-and-drop, positional insert of sections and rules, section anchors when scrolling
    • Undo operations: revert unpublished rule and section changes on UI client side
    • Firewall Timeout Settings: Protocol values are displayed at-a-glance, without requiring popup dialogs.
  • Application Rule Manager – UI Enhancements:
    • Session Management: View a list of sessions, and their corresponding status (collecting data, analysis complete) and duration.
    • Rule Planning: View summary counts of grouping objects and firewall rules; View recommendations for Universal Firewall Rules
  • Grouping Objects Enhancements:
    • Improved visibility of where the Grouping Objects are used
    • View list of effective group members in terms of VMs, IP, MAC, and vNIC
  • SpoofGuard – UI Enhancements:
    • Bulk action support: Approve or clear multiple IPs at a time

NSX Edge Enhancements

  • Load Balancer Scale: Increased support of LB pool members from 32 to 256

Operations and Troubleshooting

  • Installation – UI Enhancements:
    • Filter list of clusters by status: All, Installed: Healthy, Installed: Needs Attention, Not Installed
    • Cluster Summary View: shows communication channel health status
  • Alarms on Certificate Expiration: System events and SNMP alerts are generated before and upon certificate expiration. Time interval is configurable, with default of 7 days before expiration.
  • Automatic Backup before Upgrades: When you upgrade NSX Manager to NSX 6.4.1, a backup is automatically taken and saved locally as part of the upgrade process. See Upgrade NSX Manager and Managing NSX Manager Backups Created During Upgrade for more information.

Solution Interoperability

  • vSphere 6.7 support: When upgrading to vSphere 6.7, you must first install or upgrade to NSX for vSphere 6.4.1 or later. See Upgrading vSphere in an NSX Environment in the NSX Upgrade Guide and Knowledge Base article 53710 (Update sequence for vSphere 6.7 and its compatible VMware products).

NSX License Editions

  • VMware NSX Data Center Licenses: Adds support for new VMware NSX Data Center licenses (Standard, Professional, Advanced, Enterprise Plus, Remote Office Branch Office) introduced in June 2018, and continues to support previous VMware NSX for vSphere license keys. See VMware knowledge base article 2145269 for more information about NSX licenses.

Versions, System Requirements and Installation

Note:

  • The table below lists recommended versions of VMware software. These recommendations are general and should not replace or override environment-specific recommendations.
  • This information is current as of the publication date of this document.
  • For the minimum supported version of NSX and other VMware products, see the VMware Product Interoperability Matrix. VMware declares minimum supported versions based on internal testing.

Product or Component Version
NSX for vSphere VMware recommends the latest NSX release for new deployments.

When upgrading existing deployments, please review the NSX Release Notes or contact your VMware technical support representative for more information on specific issues before planning an upgrade.

vSphere
  • For vSphere 6.0:
    Supported: 6.0 Update 2, 6.0 Update 3
    Recommended: 6.0 Update 3. vSphere 6.0 Update 3 resolves the issue of duplicate VTEPs in ESXi hosts after rebooting vCenter server. SeeVMware Knowledge Base article 2144605 for more information.
  • For vSphere 6.5:
    Supported: 6.5a, 6.5 Update 1
    Recommended: 6.5 Update 1. vSphere 6.5 Update 1 resolves the issue of EAM failing with OutOfMemory. See VMware Knowledge Base Article 2135378 for more information.
  • For vSphere 6.7
    Supported: 6.7
    Recommended: 6.7

Note: vSphere 5.5 is not supported with NSX 6.4.

Guest Introspection for Windows All versions of VMware Tools are supported. Some Guest Introspection-based features require newer VMware Tools versions:

  • Use VMware Tools 10.0.9 and 10.0.12 to enable the optional Thin Agent Network Introspection Driver component packaged with VMware Tools.
  • Upgrade to VMware Tools 10.0.8 and later to resolve slow VMs after upgrading VMware Tools in NSX / vCloud Networking and Security (see VMware knowledge base article 2144236).
  • Use VMware Tools 10.1.0 and later for Windows 10 support.
  • Use VMware Tools 10.1.10 and later for Windows Server 2016 support.
Guest Introspection for Linux This NSX version supports the following Linux versions:

  • RHEL 7 GA (64 bit)
  • SLES 12 GA (64 bit)
  • Ubuntu 14.04 LTS (64 bit)

System Requirements and Installation

For the complete list of NSX installation prerequisites, see the System Requirements for NSX section in the NSX Installation Guide.

For installation instructions, see the NSX Installation Guide or the NSX Cross-vCenter Installation Guide.

Deprecated and Discontinued Functionality

End of Life and End of Support Warnings

For information about NSX and other VMware products that must be upgraded soon, please consult the VMware Lifecycle Product Matrix.

  • NSX for vSphere 6.1.x reached End of Availability (EOA) and End of General Support (EOGS) on January 15, 2017. (See also VMware knowledge base article 2144769.)
  • vCNS Edges no longer supported. You must upgrade to an NSX Edge first before upgrading to NSX 6.3 or later.
  • NSX for vSphere 6.2.x will reach End of General Support (EOGS) on August 20 2018.
  • Based on security recommendations, 3DES as an encryption algorithm in NSX Edge IPsec VPN service is no longer supported.
    It is recommended that you switch to one of the secure ciphers available in IPsec service. This change regarding encryption algorithm is applicable to IKE SA (phase1) as well as IPsec SA (phase2) negotiation for an IPsec site.

    If 3DES encryption algorithm is in use by NSX Edge IPsec service at the time of upgrade to the release in which it’s support is removed, it will be replaced by another recommended cipher and therefore the IPsec sites that were using 3DES will not come up unless the configuration on the remote peer is modified to match the encryption algorithm used in NSX Edge.

    If using 3DES encryption, modify the encryption algorithm in the IPsec site configuration to replace 3DES with one of the supported AES variants (AES / AES256 / AES-GCM). For example, for each IPsec site configuration with the encryption algorithm as 3DES, replace it with AES. Accordingly, update the IPsec configuration at the peer endpoint.

General Behavior Changes

If you have more than one vSphere Distributed Switch, and if VXLAN is configured on one of them, you must connect any Distributed Logical Router interfaces to port groups on that vSphere Distributed Switch. Starting in NSX 6.4.1, this configuration is enforced in the UI and API. In earlier releases, you were not prevented from creating an invalid configuration.

User Interface Removals and Changes

In NSX 6.4.1, Service Composer Canvas is removed.

API Removals and Behavior Changes

Behavior Changes in NSX 6.4.1

When you create a new IP pool with POST /api/2.0/services/ipam/pools/scope/globalroot-0, or modify an existing IP pool with PUT /api/2.0/services/ipam/pools/
, and the pool has multiple IP ranges defined, validation is done to ensure that the ranges do not overlap. This validation was not previously done.

Deprecations in NSX 6.4.0
The following items are deprecated, and might be removed in a future release.

  • The systemStatus parameter in GET /api/4.0/edges/edgeID/status is deprecated.
  • GET /api/2.0/services/policy/serviceprovider/firewall/ is deprecated. Use GET /api/2.0/services/policy/serviceprovider/firewall/info instead.
  • Setting tcpStrict in the global configuration section of Distributed Firewall is deprecated. Starting in NSX 6.4.0, tcpStrict is defined at the section level. Note: If you upgrade to NSX 6.4.0 or later, the global configuration setting for tcpStrict is used to configure tcpStrict in each existing layer 3 section. tcpStrict is set to false in layer 2 sections and layer 3 redirect sections. See “Working with Distributed Firewall Configuration” in the NSX API Guide for more information.

Behavior Changes in NSX 6.4.0
In NSX 6.4.0, the <name> parameter is required when you create a controller with POST /api/2.0/vdn/controller.

NSX 6.4.0 introduces these changes in error handling:

  • Previously POST /api/2.0/vdn/controller responded with 201 Created to indicate the controller creation job is created. However, the creation of the controller might still fail. Starting in NSX 6.4.0 the response is 202 Accepted.
  • Previously if you sent an API request which is not allowed in transit or standalone mode, the response status was 400 Bad Request. Starting in 6.4.0 the response status is 403 Forbidden.

CLI Removals and Behavior Changes

Do not use unsupported commands on NSX Controller nodes
There are undocumented commands to configure NTP and DNS on NSX Controller nodes. These commands are not supported, and should not be used on NSX Controller nodes. You should only use commands which are documented in the NSX CLI Guide.

Upgrade Notes

Note: For a list of known issues affecting installation and upgrades see the section, Installation and Upgrade Known Issues.

General Upgrade Notes

  • To upgrade NSX, you must perform a full NSX upgrade including host cluster upgrade (which upgrades the host VIBs). For instructions, see the NSX Upgrade Guide including the Upgrade Host Clusters section.
  • Upgrading NSX VIBs on host clusters using VUM is not supported. Use Upgrade Coordinator, Host Preparation, or the associated REST APIs to upgrade NSX VIBs on host clusters.
  • System Requirements: For information on system requirements while installing and upgrading NSX, see the System Requirements for NSX section in NSX documentation.
  • Upgrade path for NSX: The VMware Product Interoperability Matrix provides details about the upgrade paths from VMware NSX.
  • Cross-vCenter NSX upgrade is covered in the NSX Upgrade Guide.
  • Downgrades are not supported:
    • Always capture a backup of NSX Manager before proceeding with an upgrade.
    • Once NSX has been upgraded successfully, NSX cannot be downgraded.
  • To validate that your upgrade to NSX 6.4.x was successful see knowledge base article 2134525.
  • There is no support for upgrades from vCloud Networking and Security to NSX 6.4.x. You must upgrade to a supported 6.2.x release first.
  • Interoperability: Check the VMware Product Interoperability Matrix for all relevant VMware products before upgrading.
    • Upgrading to NSX 6.4: NSX 6.4 is not compatible with vSphere 5.5.
    • Upgrading to vSphere 6.5: When upgrading to vSphere 6.5a or later 6.5 versions, you must first upgrade to NSX 6.3.0 or later. NSX 6.2.x is not compatible with vSphere 6.5. See Upgrading vSphere in an NSX Environment in the NSX Upgrade Guide.
    • Upgrading to vSphere 6.7: When upgrading to vSphere 6.7 you must first upgrade to NSX 6.4.1 or later. Earlier versions of NSX are not compatible with vSphere 6.7. See Upgrading vSphere in an NSX Environment in the NSX Upgrade Guide.
  • Partner services compatibility: If your site uses VMware partner services for Guest Introspection or Network Introspection, you must review the  VMware Compatibility Guide before you upgrade, to verify that your vendor’s service is compatible with this release of NSX.
  • Networking and Security plug-in: After upgrading NSX Manager, you must log out and log back in to the vSphere Web Client. If the NSX plug-in does not display correctly, clear your browser cache and history. If the Networking and Security plug-in does not appear in the vSphere Web Client, reset the vSphere Web Client server as explained in the NSX Upgrade Guide.
  • Stateless environments: In NSX upgrades in a stateless host environment, the new VIBs are pre-added to the Host Image profile during the NSX upgrade process. As a result, NSX on stateless hosts upgrade process follows this sequence:Prior to NSX 6.2.0, there was a single URL on NSX Manager from which VIBs for a certain version of the ESX Host could be found. (Meaning the administrator only needed to know a single URL, regardless of NSX version.) In NSX 6.2.0 and later, the new NSX VIBs are available at different URLs. To find the correct VIBs, you must perform the following steps:
    1. Find the new VIB URL from https://<nsxmanager>/bin/vdn/nwfabric.properties.
    2. Fetch VIBs of required ESX host version from corresponding URL.
    3. Add them to host image profile.

Upgrade Notes for NSX Components

NSX Manager Upgrade

  • Important: If you are upgrading NSX 6.2.0, 6.2.1, or 6.2.2 to NSX 6.3.5 or later, you must complete a workaround before starting the upgrade. See VMware Knowledge Base article 000051624 for details.
  • If you are upgrading from NSX 6.3.3 to NSX 6.3.4 or later you must first follow the workaround instructions in VMware Knowledge Base article 2151719.
  • If you use SFTP for NSX backups, change to hmac-sha2-256 after upgrading to 6.3.0 or later because there is no support for hmac-sha1. See VMware Knowledge Base article 2149282  for a list of supported security algorithms.
  • When you upgrade NSX Manager to NSX 6.4.1, a backup is automatically taken and saved locally as part of the upgrade process. See Upgrade NSX Manager for more information.
  • When you upgrade to NSX 6.4.0, the TLS settings are preserved. If you have only TLS 1.0 enabled, you will be able to view the NSX plug-in in the vSphere Web Client, but NSX Managers are not visible. There is no impact to datapath, but you cannot change any NSX Manager configuration. Log in to the NSX appliance management web UI at https://nsx-mgr-ip/ and enable TLS 1.1 and TLS 1.2. This reboots the NSX Manager appliance.

Controller Upgrade

  • The NSX Controller cluster must contain three controller nodes. If it has fewer than three controllers, you must add controllers before starting the upgrade. See Deploy NSX Controller Cluster for instructions.
  • In NSX 6.3.3, the underlying operating system of the NSX Controller changes. This means that when you upgrade from NSX 6.3.2 or earlier to NSX 6.3.3 or later, instead of an in-place software upgrade, the existing controllers are deleted one at a time, and new Photon OS based controllers are deployed using the same IP addresses.

    When the controllers are deleted, this also deletes any associated DRS anti-affinity rules. You must create new anti-affinity rules in vCenter to prevent the new controller VMs from residing on the same host.

    See Upgrade the NSX Controller Cluster for more information on controller upgrades.

Distributed Logical Router Upgrade

  • Validation is added in NSX 6.4.1 to ensure that in environments where VXLAN is configured and more than one vSphere Distributed Switch is present, distributed logical router interfaces must be connected to the VXLAN-configured vSphere Distributed Switch only. Upgrading a DLR to NSX 6.4.1 or later will fail in those environments if the DLR has interfaces connected to the vSphere Distributed Switch that is not configured for VXLAN. The UI no longer displays the unsupported vSphere Distributed Switch.

Host Cluster Upgrade

  • If you upgrade from NSX 6.3.2 or earlier to NSX 6.3.3 or later, the NSX VIB names change.
    The esx-vxlan and esx-vsip VIBs are replaced with esx-nsxv if you have NSX 6.3.3 or later installed on ESXi 6.0 or later.
  • Rebootless upgrade and uninstall on hosts: On vSphere 6.0 and later, once you have upgraded from NSX 6.2.x to NSX 6.3.x or later, any subsequent NSX VIB changes will not require a reboot. Instead hosts must enter maintenance mode to complete the VIB change. This affects both NSX host cluster upgrade, and ESXi upgrade. See the NSX Upgrade Guide for more information.

NSX Edge Upgrade

  • Host clusters must be prepared for NSX before upgrading NSX Edge appliances: Management-plane communication between NSX Manager and Edge via the VIX channel is no longer supported starting in 6.3.0. Only the message bus channel is supported. When you upgrade from NSX 6.2.x or earlier to NSX 6.3.0 or later, you must verify that host clusters where NSX Edge appliances are deployed are prepared for NSX, and that the messaging infrastructure status is GREEN. If host clusters are not prepared for NSX, upgrade of the NSX Edge appliance will fail. See Upgrade NSX Edge in the NSX Upgrade Guide for details.
  • Upgrading Edge Services Gateway (ESG):
    Starting in NSX 6.2.5, resource reservation is carried out at the time of NSX Edge upgrade. When vSphere HA is enabled on a cluster having insufficient resources, the upgrade operation may fail due to vSphere HA constraints being violated.To avoid such upgrade failures, perform the following steps before you upgrade an ESG:

    The following resource reservations are used by the NSX Manager if you have not explicitly set values at the time of install or upgrade.

    NSX Edge
    Form Factor
    CPU Reservation Memory Reservation
    COMPACT 1000MHz 512 MB
    LARGE 2000MHz 1024 MB
    QUADLARGE 4000MHz 2048 MB
    X-LARGE 6000MHz 8192 MB
    1. Always ensure that your installation follows the best practices laid out for vSphere HA. Refer to document Knowledge Base article 1002080 .
    2. Use the NSX tuning configuration API:
      PUT https://<nsxmanager>/api/4.0/edgePublish/tuningConfiguration
      ensuring that values for edgeVCpuReservationPercentage and edgeMemoryReservationPercentage fit within available resources for the form factor (see table above for defaults).
  • Disable vSphere’s Virtual Machine Startup option where vSphere HA is enabled and Edges are deployed. After you upgrade your 6.2.4 or earlier NSX Edges to 6.2.5 or later, you must turn off the vSphere Virtual Machine Startup option for each NSX Edge in a cluster where vSphere HA is enabled and Edges are deployed. To do this, open the vSphere Web Client, find the ESXi host where NSX Edge virtual machine resides, click Manage > Settings, and, under Virtual Machines, select VM Startup/Shutdown, click Edit, and make sure that the virtual machine is in Manual mode (that is, make sure it is not added to the Automatic Startup/Shutdown list).
  • Before upgrading to NSX 6.2.5 or later, make sure all load balancer cipher lists are colon separated. If your cipher list uses another separator such as a comma, make a PUT call to https://nsxmgr_ip/api/4.0/edges/EdgeID/loadbalancer/config/applicationprofiles and replace each  <ciphers> </ciphers> list in <clientssl> </clientssl> and <serverssl> </serverssl> with a colon-separated list. For example, the relevant segment of the request body might look like the following. Repeat this procedure for all application profiles:
    <applicationProfile>
      <name>https-profile</name>
      <insertXForwardedFor>false</insertXForwardedFor>
      <sslPassthrough>false</sslPassthrough>
      <template>HTTPS</template>
      <serverSslEnabled>true</serverSslEnabled>
      <clientSsl>
        <ciphers>AES128-SHA:AES256-SHA:ECDHE-ECDSA-AES256-SHA</ciphers>
        <clientAuth>ignore</clientAuth>
        <serviceCertificate>certificate-4</serviceCertificate>
      </clientSsl>
      <serverSsl>
        <ciphers>AES128-SHA:AES256-SHA:ECDHE-ECDSA-AES256-SHA</ciphers>
        <serviceCertificate>certificate-4</serviceCertificate>
      </serverSsl>
      ...
    </applicationProfile>
  • Set Correct Cipher version for Load Balanced Clients on vROPs versions older than 6.2.0: vROPs pool members on vROPs versions older than 6.2.0 use TLS version 1.0 and therefore you must set a monitor extension value explicitly by setting "ssl-version=10" in the NSX Load Balancer configuration. See Create a Service Monitor in the NSX Administration Guide for instructions.
    {
        "expected" : null,
        "extension" : "ssl-version=10",
        "send" : null,
        "maxRetries" : 2,
        "name" : "sm_vrops",
        "url" : "/suite-api/api/deployment/node/status",
        "timeout" : 5,
        "type" : "https",
        "receive" : null,
        "interval" : 60,
        "method" : "GET"
    }

Guest Introspection Upgrade

  • Guest Introspection VM’s now contain additional host identifying information in an XML file on the machine. When logging in to the Guest Introspection VM, the file “/opt/vmware/etc/vami/ovfEnv.xml” should include host identity information.

Upgrade Notes for FIPS

When you upgrade from a version of NSX earlier than NSX 6.3.0 to NSX 6.3.0 or later, you must not enable FIPS mode before the upgrade is completed. Enabling FIPS mode before the upgrade is complete will interrupt communication between upgraded and not-upgraded components. See Understanding FIPS Mode and NSX Upgrade in the NSX Upgrade Guide for more information.

  • Ciphers supported on OS X Yosemite and OS X El Capitan: If you are using SSL VPN client on OS X 10.11 (EL Capitan), you will be able to connect using AES128-GCM-SHA256, ECDHE-RSA-AES128-GCM-SHA256, ECDHE-RSA-AES256-GCM-SHA38, AES256-SHA and AES128-SHA ciphers and those using OS X 10.10 (Yosemite) will be able to connect using AES256-SHA and AES128-SHA ciphers only.
  • Do not enable FIPS before the upgrade to NSX 6.3.x is complete. See Understand FIPS mode and NSX Upgrade in the NSX Upgrade Guide for more information.
  • Before you enable FIPS, verify any partner solutions are FIPS mode certified. See the VMware Compatibility Guide and the relevant partner documentation.

FIPS Compliance

NSX 6.4 uses FIPS 140-2 validated cryptographic modules for all security-related cryptography when correctly configured.

Note:

  • Controller and Clustering VPN: The NSX Controller uses IPsec VPN to connect Controller clusters. The IPsec VPN uses the VMware Linux Kernel Cryptographic Module (VMware Photon OS 1.0 environment), which is in the process of being CMVP validated.
  • Edge IPsec VPN: The NSX Edge IPsec VPN uses the VMware Linux Kernel Cryptographic Module (VMware NSX OS 4.4 environment), which is in the process of being CMVP validated.

Document Revision History

24 May 2018: First edition.
29th May 2018: Second edition. Known issue 2127813 added.
8th June 2018: Third edition: Information about NSX Data Center licenses added, known issue 2130563 added.

Resolved Issues

The resolved issues are grouped as follows.

General Resolved Issues

  • Fixed Issue 1993691, 1995142: Host is not removed from replication cluster after being removed from VC inventoryIf a user adds a host to a replication cluster and then removes the host from VC inventory before removing it from the cluster, the legacy host will remain in the cluster.
  • Fixed Issue 1809387: Support for Weak Secure transport protocol – TLS v1.0 removedStarting with NSX-v 6..4.1, TLS v1.0 is no longer supported.
  • Fixed Issue 2002679: In a Cross vCenter NSX environment with HW VTEP deployed in the Primary site bridged traffic may experience a network outage when Secondary NSX Manager restartsIn a Cross vCenter NSX environment with HW VTEP deployed in the Primary site bridged traffic may experience a network outage when Secondary NSX Manager restarts.
  • Fixed Issue 2065225: NSX Guest Introspection installation fails in a NSX 6.4.0 environment with the error: “a specified parameter is not correct Property.info.key[9]”GI deployments show installation status failed and service status as unknown for multiple hosts.
  • Fixed Issue 2094364: USVM process was unable to restart after a process crash because the watchdog process was unable to restart the USVM processThe USVM is put into a warning state after the process terminates.
  • Fixed Issue 2105632: USVMs attempt to sync time with Google (external) NTP serversThe timesync service has been modified to prevent this behavior.
  • Fixed Issue 2031099: NSX Host Preparation fails with an EAM error: “Host is no longer in vCenter inventory”See VMware Knowledge Base article 52550 for more information.
  • Fixed Issue 2064298: Cannot download tech support logs if month contains an accented letterIf the NSX Manager uses French locale, the tech support logs cannot be downloaded during a month that includes an accented letter in the short month name.
  • Fixed Issued 2017141: Global scope (globalroot-0) certificate is not accessible to Edge scope userThe following error message displays when edge scope user tries to access Edge Load Balancer functionality:
    “User is not authorized to access object Global and feature truststore.trustentity_management, please check object access
    scope and feature permissions for the user.”

Logical Networking and NSX Edge Resolved Issues

  • Fixed Issue 1907141: Accept GARP as a valid reply when sending ARP requestSome old devices send GARP as reply to ARP request. The fix addresses accepting GARP as a valid ARP response.
  • Fixed Issue 2039443: When DLR is created without any interfaces, DLR instance is not created on the host, however the control VM still tries to connect with the hostIf a DLR is created with no LIFs, the VDR instance is not created on the host. In such a configuration, DLR control VM will attempt to establish a VMCI connection, which will fail. This issue has no impact to data path and can be ignored.
  • Fixed Issue 2070281: Slow memory leak when DNS feature is enabled and name resolution is failing with network unreachable errorsEdge logs filled with name resolution errors. After some time, system events show that the Edge VMs memory usage is high (>90% usage).
  • Fixed Issue 2084281: VPN Tunnel doesn’t come up when traffic is initiated from behind the ESG after a VPN idle timeout expiringVPN tunnel remains down due to faulty logic that was deleting the IPSEC spd entries.
  • Fixed Issue 2092730: NSX Edge stops responding with /var/log partition at 100% disk usageNSX Edge gateway /var/log is getting full on active Edge.

NSX Manager Resolved Issues

  • Fixed Issue 1984392: Universal Objects (Transport zone, UDLR, ULS & segments) fail to synchronize with Secondary NSX ManagerThe replicator threads that replicates data to the secondary NSX Manager was stuck and unable to process new requests.
  • Fixed Issue 2064258: Netbios name verification failedIn 6.4.0, a new parameter was introduced in Domain sync functionality. This parameter is NetBios name, which is verified by NSX backend. In certain AD structures, such as special trust configuration among non-root domains which is known as Shortcut Trust, and the trust between root domain and non-root domains is Tree Root, the NetBios name verification fails.
  • Fixed Issue 1971683: NSX Manager logs false duplicate IP messageEnhanced logging for false positives.
  • Fixed Issue 2085654: If there are duplicate dynamic criteria in same set (specifically with value = null), dynamic criteria upgrade failsNSX manager fails to start after upgrade. NSX cannot be managed after upgrade.

Security Services Resolved Issues

  • Fixed Issue 1991702: Error message, “Unable to start data collection on SG with no VMs”, seen under certain conditionsStarting Endpoint Monitoring session on an Identity-based SG that maps to an AD Group displays an error: “Unable to start data collection on SG with no VMs”
  • Fixed Issue 2052634: Translation of nested Security Groups with exclude members returns incorrect resultFirewall rule incorrectly blocks or allows traffic if security groups with nesting and exclude members is used.
  • Fixed Issue 2089957: VM translation throws null pointer exception for a security group that has reference to a deleted AD groupIf an AD-Group is deleted and there is a Security group that has reference to the deleted AD-Group, Security Group->VM translation will throw a null pointer exception. Rule publishing fails.

Installation and Upgrade Resolved Issues

  • Fixed Issue 2035026: Network outage of around 40-50 seconds seen on Edge Upgrade During Edge upgrade, the network outage is reduced to 10-20 seconds.
  • Fixed Issue 2027916: Upgrade Coordinator may show that controllers that failed to upgrade were successfully upgradedFor a three-node controller cluster, if the third controller failed during upgrade and is removed, the entire controller cluster upgrade might be marked as successful even though the upgrade failed.

Known Issues

The known issues are grouped as follows.

General Known Issues

  • Issue 2130563: Warning message appears when assigning NSX Data Center license: “The selected license does not support some of the features that are currently available to the licensed assets”If you have an NSX for vSphere license assigned, and then assign an NSX Data Center license, you see the following warning message: “The selected license does not support some of the features that are currently available to the licensed assets”. This is because the two licenses define the NSX features differently. If you are assigning a license edition that licenses the same features as your current license, it is safe to ignore this message.

    See VMware knowledge base article 2145269 for more information about NSX licenses.

    Workaround: Verify the new license supports the feature you need, and ignore the warning message.

  • Issue 2127813: Cannot choose and assign NSX license key to NSX Manager while using vSphere Client (HTML5)If you log into the vSphere Client (HTML5) and add an NSX license key, you cannot assign the key from Licenses > Assets > Solutions. The new license key is not visible.

    Workaround: Use the vSphere Web Client to add and assign licenses.

  • In the vSphere Web Client, when you open a Flex component which overlaps an HTML view, the view becomes invisible.When you open a Flex component, such as a menu or dialog, which overlaps an HTML view, the view is temporarily hidden.
    (Reference: http://pubs.vmware.com/Release_Notes/en/developer/webclient/60/vwcsdk_600_releasenotes.html#issues)

    Workaround: None.

  • Issue 1874863: Unable to authenticate with changed password after sslvpn service disable/enable with local authentication serverWhen SSL VPN service is disabled and re-enabled and when using local authentication, users are unable to log in with changed passwords.

    See VMware Knowledge Base Article 2151236 for more information.

  • Issue 1702339: Vulnerability scanners might report a Quagga bgp_dump_routes vulnerability CVE-2016-4049Vulnerability scanners might report a Quagga bgp_dump_routes vulnerability CVE-2016-4049 in NSX for vSphere. NSX for vSphere uses Quagga, but the BGP functionality (including the vulnerability) is not enabled. This vulnerability alert can be safely disregarded.

    Workaround: As the product is not vulnerable, no workaround is required.

  • Issue 1993691: Removing a host without first removing it as a replication node can lead to stale entries in VSMIf a host serves as a replication node for a HW VTEP and it needs to be removed from its parent cluster, ensure first that it is no longer a replication node before removing it from the cluster. If that is not done, in some cases its status as a replication node is maintained in the NSX Manager database which can cause errors when further manipulating replication nodes.

    Workaround: See Knowledge Base Article 52418 for more information.

Installation and Upgrade Known Issues

Before upgrading, please read the section Upgrade Notes, earlier in this document.

  • Issue 2036024: NSX Manager upgrade stuck at “Verifying uploaded file” due to database disk usageUpgrade log file vsm-upgrade.log also contains this message: “Database disk usage is at 75%, but it should be less than 70%”. You can view vsm-upgrade.log in the NSX Manager support bundle. Navigate to Networking & Security > Support Bundle, and select to include NSX Manager logs.

    Workaround: Contact VMware customer support.

  • Issue 2006028: Host upgrade may fail if vCenter Server system is rebooting during upgradeIf the associated vCenter Server system is rebooted during a host upgrade, the host upgrade might fail and leave the host in maintenance mode. Clicking Resolve does not move the host out of maintenance mode. The cluster status is “Not Ready”.

    Workaround: Exit the host from maintenance mode manually. Click “Not Ready” then “Resolve All” on the cluster.

  • Issue 2001988: During NSX host cluster upgrade, Installation status in Host Preparation tab alternates between “not ready” and “installing” for the entire cluster when each host in the cluster is upgradingDuring NSX upgrade, clicking “upgrade available” for NSX prepared cluster triggers host upgrade. For clusters configured with DRS FULL AUTOMATIC, the installation status alternates between “installing” and “not ready”, even though the hosts are upgraded in the background without issues.

    Workaround: This is a user interface issue and can be ignored. Wait for the host cluster upgrade to proceed.

  • Issue 1859572: During the uninstall of NSX VIBs version 6.3.x on ESXi hosts that are being managed by vCenter version 6.0.0, the host continues to stay in Maintenance mode
    If you are uninstalling NSX VIBs version 6.3.x on a cluster, the workflow involves putting the hosts into Maintenance mode, uninstalling the VIBs and then removing the hosts from Maintenance mode by the EAM service. However, if such hosts are managed by vCenter server version 6.0.0, then this results in the host being stuck in Maintenance mode post uninstalling the VIBs. The EAM service responsible for uninstalling the VIBs puts the host in Maintenance mode but fails to move the hosts out of Maintenance mode.Workaround: Manually move the host out of Maintenance mode. This issue will not be seen if the host is managed by vCenter server version 6.5a and above.
  • Issue 1797929: Message bus channel down after host cluster upgrade
    After a host cluster upgrade, vCenter 6.0 (and earlier) does not generate the event “reconnect”, and as a result, NSX Manager does not set up the messaging infrastructure on the host. This issue has been fixed in vCenter 6.5.

    Workaround: Resync the messaging infrastructure as below:
    POST https://<ip&gt;:/api/2.0/nwfabric/configure?action=synchronize

    <nwFabricFeatureConfig>
      <featureId>com.vmware.vshield.vsm.messagingInfra</featureId>
      <resourceConfig>
        <resourceId>host-15</resourceId>
      </resourceConfig>
    </nwFabricFeatureConfig>
  • Issue 1263858: SSL VPN does not send upgrade notification to remote client
    SSL VPN gateway does not send an upgrade notification to users. The administrator has to manually communicate that the SSL VPN gateway (server) is updated to remote users and they must update their clients.

    Workaround: Users need to uninstall the older version of client and install the latest version manually.

  • Issue 1979457: If GI-SVM is deleted or removed during the upgrade process and backward compatibility mode, then identity firewall through Guest Introspection (GI) will not work unless the GI cluster is upgraded.Identity firewall will not work and no logs related to identity firewall would be seen. Identity firewall protection will be suspended unless the cluster is upgraded. 

    Workaround: Upgrade the cluster so that all the hosts are running the newer version of GI-SVM.

    -Or –

    Enable Log scraper for identity firewall to work.

  • Issue 2106417: GI SVM goes into Failed state after NSX Manager is upgraded from 6.3.0 to 6.4.1 If you are using vCenter Server 6.5 u1, and upgrade NSX Manager from 6.3.0 to 6.4.1, GI SVM upgrade might be Failed.

    Workaround: Once the issue occurs, delete and then redeploy the GI SVMs.

NSX Manager Known Issues

  • Issue 2088315: NSX manager backup operation failsThe rabbitmq certificate present in the backup file S01_NSX_00_00_00_Fri23Mar2018 is expired. Use the following steps to check the backup certificate.
    openssl enc -md sha512 -d -aes-256-cbc -salt -in S01_NSX_00_00_00_Fri23Mar2018 -out backup.tar -pass 'pass:

    )' tar -xvf backup.tar

    openssl x509 -enddate -noout -in home/secureall/secureall/.store/.rabbitmq_cert.pem

    Output is a expired date:
    notAfter=Dec 26 20:20:40 1978 GMT

    Contact support. Support should create a new certificate.

Logical Networking and NSX Edge Known Issues

  • Issue 1747978: OSPF adjacencies are deleted with MD5 authentication after NSX Edge HA failover
    In an NSX for vSphere 6.2.4 environment where the NSX Edge is configured for HA with OSPF graceful restart configured and MD5 is used for authentication, OSPF fails to start gracefully. Adjacencies forms only after the dead timer expires on the OSPF neighbor nodes.Workaround: None
  • Issue 2005973: Routing daemon MSR loses all routing configuration after deleting a few gre tunnels and then doing a force sync of edge node from Management PlaneThis problem can occur on a edge with BGP sessions over GRE tunnels. When some of the GRE tunnels are deleted and then a force sync of the edge is done from MP, edge loses all routing configuration.

    Workaround: Reboot edge node.

  • Issue 2015368: Firewall logging may cause out-of-memory issues under certain circumstancesWhen the Edge firewall is enabled in configurations of high scale, and firewall logging is enabled on some or all rules, it is possible, although uncommon, for the Edge to encounter an Out-Of-Memory (OOM) condition. This is especially true when there is a lot of traffic hitting the logging rules. When an OOM condition occurs, the Edge VM will automatically reboot.

    Workaround: Firewall logging is best used for debugging purposes, and then disabled again for normal use. To avoid this OOM issue, disable all firewall logging.

  • Issue 2005900: Routing daemon MSR on Edge is stuck at 100% CPU when all GRE tunnels are flapped in an 8-way iBGP/multi-hop BGP ECMP scale topologyThis problem can occur in a scale topology where iBGP or multi-hop BGP is configured on ESG with multiple neighbors running over many GRE tunnels. When multiple GRE tunnels flap, MSR may get stuck indefinitely at 100% CPU.

    Workaround: Reboot edge node.

Security Services Known Issues

  • Issue 2017806: Error message is not clear when adding members to security groups used in RDSH firewall sections on security policiesIf a security group is used in RDSH firewall sections on security policies, you can only add directory group members to it. If you try to add any member other than directory group, the following error displays:
    “Security group is being used by service composer, Firewall and cannot be modified”

    The error message doe not convey that the security group cannot be modified because the security group is used in RDSH firewall sections on security policies.

    Workaround: None.

  • Issue 1648578: NSX forces the addition of cluster/network/storage when creating a new NetX host-based service instance
    When you create a new service instance from the vSphere Web Client for NetX host-based services such as Firewall, IDS, and IPS , you are forced to add cluster/network/storage even though these are not required.Workaround: When creating a new service instance, you may add any information for cluster/network/storage to fill out the fields. This will allow the creation of the service instance and you will be able to proceed as required.
  • Issue 2018077: Firewall publish fails when rule has custom L7 ALG service without destination port and protocolWhen creating L7 service by selecting any of the following L7 ALG APP (APP_FTP, APP_TFTP, APP_DCERPC, APP_ORACLE) without providing destination port and protocol, and then using them in firewall rules, the firewall rule publish fails.

    Workaround: Provide the appropriate destination port and protocol (TCP/UDP) values while creating custom L7 service for the following ALG services:

    • APP_FTP : port 21 protocol: TCP
    • APP_TFTP: port 69 protocol: UDP
    • APP_DCERPC: port 135 protocol: TCP
    • APP_ORACLE: port 1521 protocol: TCP

Monitoring Services Known Issues

  • Issue 1466790: Unable to choose VMs on bridged network using the NSX traceflow tool
    Using the NSX traceflow tool, you cannot select VMs that are not attached to a logical switch. This means that VMs on an L2 bridged network cannot be chosen by VM name as the source or destination address for traceflow inspection.Workaround: For VMs attached to L2 bridged networks, use the IP address or MAC address of the interface you wish to specify as destination in a traceflow inspection. You cannot choose VMs attached to L2 bridged networks as source. See the knowledge base article 2129191 for more information.