Per Site Policies for vSAN 6.6 Stretched Clusters

Per Site Policies for vSAN 6.6 Stretched Clusters

Prior to vSAN 6.6
Up until vSAN 6.6, protection of objects in a Stretched Cluster configuration was comprised of one copy of data at each site and a witness component residing on the Witness host.

This configuration provided protection from a single failure in any 1 of the 3 sites, due to each site being configured as a Fault Domain. Using existing policies, 3 Fault Domains allow for a maximum number of a single failure.

During normal operation, this was not a significant issue. In the event of a device or node failure, additional traffic could potentially traverse the inter-site link for operations such as servicing VM reads and writes, as well as repairing the absent or degraded replica.

Stretched Cluster bandwidth sizing is sized based on the number of writes a cluster requires. Capacity for resync operations are taken into account with 25% of the available bandwidth allocated. Reads however are not taken into account normally in sizing.

Impact during when an object is absent or degraded
Availability scenarios differ depending on the type of failure or lack of availability.

If a host goes offline, or a capacity device is unmounted, the components will not be replaced until either the 60 minute threshold is reached. This is configurable, but VMware recommends not to adjust this setting. During the timeframe that the object is absent, if the object is present on the alternate site from the virtual machine, reads from the object will cause additional overhead  while traversing the inter-site link. Resycs will not occur until after 60 minutes. After the 60 minute threshold occurs, reads and resyncs will traverse the inter-site link until the object is replaced on the site it is absent from.

When a device fails due to a hardware error, data is immediately resynched to repair the object data. Like an absent object after the 60 minute threshold, a degraded event will cause immediate reads and resyncs across the inter-site link.

The impact can be insignificant if there are few items that need to be replaced or if the inter-site link is oversized. The impact can be significant if there are many items to be replaced or the inter-site link is already at full utilization.

Also consider that an additional failure until the object is repaired will cause the object(s) to become inaccessible. This is because up until vSAN 6.6, Stretched Clusters only protect from a single failure.

New Policy Rules in vSAN 6.6
In vSAN 6.6 a few rule changes were introduced. Use of these rules provide additional protection or flexibility for Stretched Cluster scenarios.

  • Failures to Tolerate is now called Primary Failures To Tolerate, this is the only rule that received a name change. It still behaves the same, and in a Stretched Cluster, the only possible values are 0 or 1.
  • Failure Tolerance Method has not changed, but when used in conjunction with another rule, it change change object placement behavior.
  • Secondary Failures To Tolerate is a new rule that specifically changes the local protection behavior of objects in each site of a vSAN 6.6 Stretched Cluster.
  • The final new rule is Affinity. This rule is only applicable when Primary Failures To Tolerate is 0. When Primary Failures To Tolerate is 0, this rule provides the administrator the ability to choose which site the vSAN object should reside on, either the Preferred or Secondary Fault Domain.

These new policy rules provide:

  • Local Protection for objects on vSAN 6.6 Stretched Clusters
  • Site Affinity for objects vSAN 6.6 Stretched Clusters when protection across sites is not desired.
New Rule Old Rule Behavior Specific to Stretched Clusters vSAN 6.6 Requirements Stretched Cluster Possible Values
Primary Failures To Tolerate (PFTT) Failures to Tolerate (FTT) Available for traditional vSAN or Stretched Cluster vSAN configurations. Specific to Stretched Clusters, this rule determines whether an object is protected on each site or only on a single site On-Disk Format v5 1 – Enables Protection Across Sites
0 – Protection in a Single Site
Secondary Failures To Tolerate (SFTT) Only available for vSAN 6.6 Stretched Clusters and defines the number of disk or host failures a storage object can tolerate. On-Disk Format v5
Stretched Cluster
Proper local and remote host count to satisfy Failure Protection Method and Number of Failures independently
0, 1, 2, 3 – Local protection FTT
Failure Tolerance Method (FTM) Failure Protection Method Performance 2n+1 hosts for Mirroring (0,1,2,3)
2n+2 hosts for Erasure Coding (1,2)
RAID1 (Mirroring) – Performance
RAID5/6 (Erasure Coding) – Capacity*

*All-Flash only

Affinity Provides a choice of the Preferred or Secondary Fault Domain for vSAN object placement. On-Disk Format v5
Stretched Cluster
Primary Failures to Tolerate = 0
Preferred Fault Domain
Secondary Fault Domain

The only upgrade requirement for vSAN 6.5 customers to use the new rules in vSAN 6.6, is the requirement to upgrade the On-Disk format from Version 3 to Version 5. Bandwidth requirements do not change.

Upon upgrade from a vSAN 6.5 Stretched Cluster to a vSAN 6.6 Stretched Cluster, an existing Stretched Cluster policy of FTT=1 with FTM=Mirroring will become a PFTT=1, FTM=Mirroring.

Secondary Failures To Tolerate Failure Tolerance Method Hosts Required Per Site Hosts Recommended Per Site
0 Mirroring
(Hybrid or All-Flash architecture)
1
1 3 4
2 5 6
3 7 8
1 Erasure Coding
(requires All-Flash architecture)
4 5
2 6 7

Data access behavior using the new Policy Rules
vSAN Stretched Clusters have traditionally written a copy of data to each site using a Mirroring Failure Tolerance Method. These were full writes to each site, with reads being handled locally using the Site Affinity feature. Write operations are dependent on VM Storage Policy rules in a vSAN 6.6 Stretched Cluster.

Primary Failures To Tolerate behavior

  • When a Primary Failures to Tolerate rule is equal to 1, writes will continue to be written in a mirrored fashion across sites.
  • When a Primary Failures to Tolerate rule is equal to 0, writes will only occur in the site that is specified in the Affinity rule.
  • Reads continue to occur from the site a VM resides on.

Secondary Failures To Tolerate behavior

  • When a Secondary Failures to Tolerate rule is in place, the behavior within a site adheres to the Failure Tolerance Method rule.
  • As illustrated in the above table, the number of failures to tolerate, combined with the Failure Tolerance Method, determine how many hosts are required per site to satisfy the rule requirements.
  • Writes and reads occur within each site in the same fashion as they would in a traditional vSAN cluster, but per site.
  • Only when data cannot be repaired locally, such as cases where the only present copies of data reside on the alternate site, will data be fetched from the alternate site.

Affinity

  • The Affinity rule is only used to specify which site a vSAN object, either Preferred or Secondary, will reside on.
  • It is only honored when a Primary Failures To Tolerate rule is set to 0.
  • VMware recommends that virtual machines are run on the same site that their vSAN objects reside on.
    • Because the Affinity rule is a Storage Policy rule, it only pertains to vSAN objects and not virtual machine placement.
    • This is because read and write operations will be required to traverse the inter-site link when the virtual machine and vSAN objects do not reside in the same site.

vSAN Stretched Cluster Capacity Sizing when using Per-Site Policy Rules
Prior to Per-Site policy rules, vSAN Stretched Cluster capacity sizing was primarily based on the Mirroring Failure Tolerance Method, assuming a FTT=1.

This is because only a single copy of data resided in each site.

With Per-Site Policy Rules, capacity requirements can change entirely based on Policy Rule requirements.

The following table illustrates some capacity sizing scenarios based on a default site policy with a vmdk requiring 100GB. For single site scenarios, assuming Preferred Site.

vSAN Version Protection FTT/PFTT FTM SFTT Capacity Required in Preferred Site Capacity Required in Secondary Site Capacity Requirement
Pre-vSAN 6.6 Across Sites Only 1 Mirroring NA 100GB 100GB 2x
vSAN 6.6 Across Sites Only 1 Mirroring 0 100GB 100GB 2x
Across Sites with
Local Mirroring
(RAID1 Single Failure)
1 Mirroring 1 200GB 200GB 4x
Across Sites with
Local Mirroring
(RAID1 Double Failure)
1 Mirroring 2 300GB 300GB 6x
Across Sites with
Local Mirroring
(RAID1 Triple Failure)
1 Mirroring 3 400GB 400GB 8x
Across Sites with
Local Erasure Coding
(RAID5/Single Failure)
1 Erasure Coding 1 133GB 133GB 2.66x
Across Sites with
Local Erasure Coding
(RAID6/Double Failure)
1 Erasure Coding 2 150GB 150GB 3x
Single Site with
Mirroring
(RAID1 Single Failure)
0 Mirroring 1 200GB 0 2x
Single Site with
Mirroring
(RAID1 Double Failure)
0 Mirroring 2 300GB 0 3x
Single Site with
Mirroring
(RAID1 Triple Failure)
0 Mirroring 3 400GB 0 4x
Single Site with
Erasure Coding
(RAID5/Single Failure)
0 Erasure Coding 1 133GB 0 1.3x
Single Site with
Erasure Coding
(RAID6/Double Failure)
0 Erasure Coding 2 150GB 0 1.5x

vSAN Stretched Cluster Witness Bandwidth considerations when using Per-Site Policy Rules
As Per-Site Policy Rules add local protection, object are distributed into even more components. Because the bandwidth requirements to the Witness Host are based on the number of components, using these policy rules will increase the overall component count.

The following is an example of the impact of changing a Storage Policy to include local protection in a Stretched Cluster scenario, with a virtual machine with a single vmdk that is smaller than 255GB

  • Using Pre-vSAN 6.6 Policy Rules
    • Would consume 9 components
      • 3 Components for the vmdk (1 in the Preferred Site, 1 in the Secondary Site, 1 on the Witness Host)
        *Up to a vmdk size of 255GB
      • 3 Components for the VM Home space (1 in the Preferred Site, 1 in the Secondary Site, 1 on the Witness Host)
      • 3 Components for the Virtual Swap file (1 in the Preferred Site, 1 in the Secondary Site, 1 on the Witness Host)
  • Using vSAN 6.6 Policy Rules with Protection across sites (PFTT=1) and Mirroring (RAID1 Single Failure) within Sites
    • Would consume 17 components
      • 7 Components for the vmdk (3 in the Preferred Site, 3 in the Secondary Site, 1 on the Witness Host)
        *Up to a vmdk size of 255GB
      • 7 Components for the VM Home space (3 in the Preferred Site, 3 in the Secondary Site, 1 on the Witness Host)
      • 3 Components for the Virtual Swap file (1 in the Preferred Site, 1 in the Secondary Site, 1 on the Witness Host)
  • Using vSAN 6.6 Policy Rules with Protection across sites (PFTT=1) and Erasure Coding (RAID5 Single Failure) within Sites
    • Would consume 21 components
      • 9 Components for the vmdk (4 in the Preferred Site, 4 in the Secondary Site, 1 on the Witness Host)
        *Up to a vmdk size of 255GB
      • 9 Components for the VM Home space (4 in the Preferred Site, 4 in the Secondary Site, 1 on the Witness Host)
      • 3 Components for the Virtual Swap file (1 in the Preferred Site, 1 in the Secondary Site, 1 on the Witness Host)
  • Using vSAN 6.6 Policy Rules with Protection in a single site (PFTT=0) and Mirroring (RAID1 Single Failure) – Preferred Fault Domain shown below
    • Would consume 9 components
      • 3 Components for the vmdk (3 in the Preferred Site, 0 in the Secondary Site, 0 on the Witness Host)
        *Up to a vmdk size of 255GB
      • 3 Components for the VM Home space (3 in the Preferred Site, 0 in the Secondary Site, 0 on the Witness Host)
      • 3 Components for the Virtual Swap file (1 in the Preferred Site, 1 in the Secondary Site, 1 on the Witness Host)
  • Using vSAN 6.6 Policy Rules with Protection in a single site (PFTT=0) and Erasure Coding (RAID5 Single Failure) – Preferred Fault Domain shown below
    • Would consume 11 components
      • 4 Components for the vmdk (4 in the Preferred Site, 0 in the Secondary Site, 0 on the Witness Host)
        *Up to a vmdk size of 255GB
      • 4 Components for the VM Home space (4 in the Preferred Site, 0 in the Secondary Site, 0 on the Witness Host)
      • 3 Components for the Virtual Swap file (1 in the Preferred Site, 1 in the Secondary Site, 1 on the Witness Host)

**Notice in each case that the VM SWAP object retains the PFTT=1 and FTM=Mirroring storage policy. This is because the VM SWAP object has a hardcoded policy.

Stretched Cluster configurations can accommodate a maximum of 45,000 components. Implementing the Local Protection Per-Site Policy Rules can increase the overall component count significantly. The example VM configuration used above could allow for up to 5,000 VM’s on a Stretched Cluster configuration. That’s 9 components X 5,000 VM’s = 45,000 components (max). By assigning local protection with Mirroring to the same VM configuration, would allow for almost 2,600 VM’s of the same configuration, only by adding local Mirrored protection. Choosing Erasure Coding rather than Mirroring for local protection would reduce the number of identical VMs to about 2,100.

Not every environment is going to be uniform like these calculations might indicate. A vmdk that is larger than 255GB is going to require at least one component for every 255GB chunk. Specifying a Policy Rule of Stripe Width, or possibly breaking a component into smaller chunks after a rebalance is going to increase the component count as well.

The witness bandwidth requirement is 2Mbps for every 1000 components.  Using this formula, some additional

  • 200 virtual machines with 500GB vmdks (12 components each) using Pre-vSAN 6.6 policies would require 4.8Mbps of bandwidth to the Witness host
    • 3 for swap, 3 for VM home space, 6 for vmdks = 12
    • 12 components X 200 VMs = 2,400 components
    • 2Mbps for every 1000 is 2.4 X 2Mbps = 4.8Mbps
  • The same 200 virtual machines with 500GB vmdks using vSAN 6.6 Policy Rules for Cross Site protection with local Mirroring would require
    • 3 for swap, 7 for VM home space, 14 for vmdks = 24
    • 24 components X 200 VMs = 4,800 components
    • 2Mbps for every 1000 is 4.8 X 2Mbps = 9.6Mbps
  • The same 200 virtual machines with 500GB vmdks using vSAN 6.6 Policy Rules for Cross Site protection with local Erasure Coding would require
    • 3 for swap, 9 for VM home space, 18 for vmdks = 30
    • 30 components X 200 VMs = 6,000 components
    • 2Mbps for every 1000 is 6 X 2Mbps = 12Mbps

These examples show that by adding local protection, component counts increase, as well as witness bandwidth requirements.

vSAN 6.6 Per-Site Policy Rules Summary
With the introduction of Per-Site Policy Rules, vSAN 6.6 adds two important capabilities to vSAN Stretched Clusters.

  1. Local Protection
  2. Site Affinity

As these features provide additional protection and data availability, it is important to consider capacity and bandwidth sizing scenarios.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s