Friday, April 25, 2014

vMotion network design considerations - Multi-NIC vMotion and Link Aggregation in vSphere 5.5 - Part2


In my previous Post, I tried to find out how support for LACP in vSphere 5.5 has changed the design consideration for vMotion Network.  One has to decide between Multi-NIC vMotion and Link Aggregation.

After some search on Internet and talking to my friends in virtualization domain, I was referred to this excellent post from Chris Wahl (VCDX -104) on his blog wahlnetwork.com
Not only does this post talk about this in depth but also the comments posted helped me understand the concept and its pros and cons even better.

Multi-NIC vMotion when used with LBT or Load Based Teaming is still a clear winner when compared to Link Aggregation for vMotion networks.

Few reasons for this are summarized here:

  • A LAG can only perform traffic distribution. Even if NIC is saturated.
  • LBT, on the other hand, actively examines traffic on a vSwitch. It is aware of the load on NIC which helps it avoid sending traffic on saturated NIC. 

Thoughts ?


So LBT is much better to load balance traffic in this case. 

That doesn't mean that LACP is bad and should not be used in vSphere designs. It perhaps has a different use case to suit its functionality. For eg: NFS based Storage.
I will post the pros and cons and use cases for when to use link aggregations in a different post.

Till then keep networking. Cheers !


vMotion network design considerations - Multi-NIC vMotion and Link Aggregation in vSphere 5.5 - Part1

vSphere 5.5 has been now out for some time. As always with a new release, there were some major enhancements (good for us), specially on the networking and vDS.

I was impressed by Multi-NIC vMotion support when it was introduced  in vSphere 5.0. With advent of 10G NIC cards and ever increasing size of worlkloads that run in single VM, it was important to leverage this technology to revamp existing vMotion.
And I must say, VMware is always two steps ahead in innovation and improving its product features.

I am today looking into the design consideration for vMotion network by using Multi-NIC vMotion and LACP support in vDS in vSphere 5.5 

LACP support till vSphere 5.1 was rather limited and exclusively dependent on IP Hash load balancing. With vSphere 5.5 a lot has changed with vDS specially LACP and support for all LACP load balancing types.

LACP now support dynamic link aggregation, multiple LAGs and is not exclusively dependent on IP Hash load balancing.

Till vSphere 5.1 Multi-NIC vMotion was correct as compared to LAGs backed NICs which in turn work on LACP. LACP support till vSphere 5.1 was rather limited and exclusively dependent on IP Hash load balancing.

Now for all of you who know how IP hash load balancing works and and how vMotion selects the preferred NIC can relate to this observation.

Rest of you can dig a little and it would be very clear. Please refer to excellent article by Frank Denneman ( Here )

I am still contemplating how does this change the LACP vs Multi-NIC vMotion debate. I am trying to discuss this with my friends in VMware and lets see what are there thoughts on this.

Keep watching this space on how has this changed with new vSphere 5.5

  

Back to VMware basics - vMotion Deepdive

Ever thought on how vMotion works ? 
I plan to write a detailed post divided into various sub posts to help you understand how the process works and what happens in background.

Architecture

vSphere 5.5 vMotion transfers the entire execution state of a running virtual machine from the source VMware vSphere ESXi host to the destination ESXi host over a high speed network. The execution state primarily consists of the following components:

  • The virtual machine’s virtual disks
  • The virtual machine’s physical memory
  • The virtual device state, including the state of the CPU, network and disk adapters, SVGA , and so on
  •  External network connections

Lets see how vSphere 5.5 vMotion handles the challenges associated with the transfer of these different states of a virtual machine.

Migration of Virtual Machine’s Storage

vSphere 5.5 vMotion builds on Storage vMotion technology for transfer of the virtual machine’s virtual disks. We need to understand, Storage vMotion architecture briefly to provide the necessary context.

Storage vMotion uses a synchronous mirroring approach to migrate a virtual disk from one datastore to another datastore on the same physical host. This is implemented by using two concurrent processes. First, a bulk copy (also known as a clone) process proceeds linearly across the virtual disk in a single pass and performs a bulk copy of the disk contents from the source datastore to the destination datastore.

Concurrently, an I/O mirroring process transports any additional changes that occur to the virtual disk, because of the guest’s ongoing modifications. The I/O mirroring process accomplishes that by mirroring the ongoing modifications to the virtual disk on both the source and the destination datastores. Storage vMotion mirrors I/O only to the disk region that has already been copied by the bulk copy process. Guest writes to a disk region that the bulk copy process has not yet copied are not mirrored because changes to this disk region will be copied by the bulk copy process eventually.

Migration of Virtual Machine’s Memory

vSphere 5.5 vMotion builds on existing vMotion technology for transfer of the virtual machine’s memory. Both vSphere 5.5 vMotion and vMotion use essentially the same pre-copy iterative approach to transfer the memory contents. The approach is as follows:


  •  [Phase 1] Guest trace phase The guest memory is staged for migration during this phase. Traces are placed on the guest memory pages to track any modifications by the guest during the migration.
  • [Phase 2] Pre-copy phase Because the virtual machine continues to run and actively modify its memory state on the source host during this phase, the memory contents of the virtual machine are copied from the source ESXi host to the destination ESXi host in an iterative process. Each iteration copies only the memory pages that were modified during the previous iteration.
  • [Phase 3] Switch-over phase During this final phase, the virtual machine is momentarily quiesced on the source ESXi host, the last set of memory changes are copied to the target ESXi host, and the virtual machine is resumed on the target ESXi host.


In contrast to vMotion prior to vSphere 5.1, vSphere 5.5 vMotion also must transport any additional changes that occur to the virtual machine’s virtual disks due to the guest’s ongoing operations during the memory migration. In addition, vSphere 5.5 vMotion must coordinate the several copying processes including the bulk copy process, I/O mirroring process, and memory copy process.

To allow a virtual machine to continue to run during the entire migration process, and to achieve the desired amount of transparency, vSphere 5.5 vMotion begins the memory copy process only after the bulk copy process completes the copy of the disk contents.The memory copy process runs concurrently with the I/O mirroring process, so the modifications to the memory and virtual disks, due to the guest’s ongoing operations, are reflected to the destination host.

Because both the memory copy process and I/O mirroring process contend for the same network bandwidth, the memory copy duration could be slightly higher in vSphere 5.5 vMotion compared to the memory copy duration during vMotion. Generally, this is not an issue because the memory dirtying rate is typically high compared to the rate at which disk blocks change.

vSphere 5.5 vMotion guarantees atomic switch-over between source and destination hosts by ensuring both memory and disk state of the virtual machine are in lock-step before switch-over, and fails back to source host and source disks in the event of any unexpected failure during disk or memory copy.

Further details on vMotion, Storage vMotion and some related concepts in my next post.