Wednesday, March 4, 2020

Dell Technologies Cloud - Subscription Model

Dell Technologies on Tuesday rolled out a new subscription-based model for hybrid cloud deployments, available with the Dell EMC VxRail. The new offering includes the hardware and software, as well as the services necessary for relatively quick deployments, such as support, deployment and asset recovery services. 
Customers can sign up for a one-year or three-year agreement, priced on a per node, per month basis for as low as $70/node per day. Deployments can take as little as two weeks, Dell said.  
Dell claims the new offering is the "fastest hybrid cloud deployment" in the industry. 
The new offering expands the Dell Technologies Cloud portfolio and is part of a broader portfolio of consumption-based and as-a-service offerings called Dell Technologies on Demand
Dell's hybrid cloud strategy aims to knit its data center and hybrid cloud technologies with public cloud providers. VMware is the linchpin to the Dell's cloud effort, offering the software glue to a cloud platform that can span internal and public resources. VxRail enables deep integration across the VMware ecosystem.

Back Up, Restore and Migrate Kubernetes with Velero

Part of the Tanzu cloud native portfolio, VMware’s Velero is an open source project that provides backup, restore and migration capabilities for Kubernetes. Originally developed by Heptio and known as Ark, Velero is supported by Dell’s PowerProtect, allowing PowerProtect users to back up their Kubernetes clusters.

Velero is a command-line tool that backs up clusters and restores them in case of loss, migrates cluster resources to other clusters, and replicates a production cluster to development and testing clusters. It consists of a server that runs on a cluster and the command-line client that runs locally. A recent Kubernauts post about Velero stated it well:
“[Velero] takes snapshots of your cluster’s Persistent Volumes using your cloud provider’s block storage snapshot features, and can then restore your cluster’s objects and Persistent Volumes to a previous state.”
The technology was built in Go. Velero has a Go client SDK that integrates with the Kubernetes API,  using plugins to provide capabilities for syncing across multiple cloud environments.
“We have a system of syncing,” said Campos. “Once you create a backup, it is continually syncing. So whenever you want to restore, your backups are up to date.”
The Velero community has about 3,300 stars on GitHub, owing in part to the ability to provide immediate backup, adding a degree of safety and the ability to restore Kubernetes clusters.

Velero Use Cases

Through its plug-in capability, Velero is a foundation technology for backing up Kubernetes environments, both in the cloud and on-premises. Customers don’t want to be locked in, said Spoonemore. “They use Velero to migrate from one cloud provider to another cloud provider.” Velero’s extensibility allows the Power Platform to use plugins that can work in AWS, Google, Digital Ocean and Portworx, to name a few.
Velero is suited for disaster recovery use cases, as well as for snapshotting application state, prior to performing system operations on a cluster. Velero backs up the pods and the clusters but also the data in the persistent volumes. Backups can be done selectively: A big cluster can be backed up, for example, by namespace and/or label. PowerProtect software uses Velero and puts it into a single stack. VMware’s Tanzu Mission Control is used to manage across a fleet of clusters. It integrates Velero across Tanzu for backup, restoration and migration of Kubernetes clusters. It allows for data protection and leverages Velero to back up K8s resources.

VMware Tanzu for K8s Architects

Recently at VMworld 2019 US, VMware announced a tech preview* of VMware Tanzu Mission Control, a way to bring consistency and control to all your Kubernetes clusters regardless of where they are running. The Kubernetes Architecture team at VMware focuses on getting customers into production with Kubernetes, and VMware Tanzu Mission Control is a much-needed tool to manage multiple Kubernetes clusters. While Kubernetes itself tends to treat the namespace as a logical boundary between groups of resources, many customers opt instead to treat entire clusters as isolation mechanisms between workloads and have asked VMware for a way to manage the many clusters they deploy.
VMware Tanzu Mission Control
To take it further, different workloads have different needs, which means large organizations could have a myriad of different infrastructure: public clouds, private clouds, and bare metal. Enterprises could have an even more granular choice in each type of infrastructure: bare metal with SSDs, cloud driven by spot instances, and on and on. This is why we’re so excited about VMware Tanzu Mission Control: It will provide a tool to manage your Kubernetes clusters across this matrix of infrastructure. So, of all the features, which are we most excited about?

Centralized Kubernetes Lifecycle Management

With just a few clicks, VMware Tanzu Mission Control will provide a clean way to manage the lifecycle of cloud-hosted Kubernetes clusters. If enterprises are running any flavor of upstream-conformant Kubernetes on bare metal or VMware vSphere in the data center or have managed Kubernetes services in the public cloud, they will be able to bring those clusters under centralized management with VMware Tanzu Mission Control.

Unified Access Management

Companies working with multiple clusters have to track which teams have access to which clusters. Although Active Directory groups can make access management a bit easier, most companies we’ve worked with that deploy many clusters have to get creative around cluster-access management. With VMware Tanzu Mission Control, platform operations teams will be able to manage access to all their Kubernetes clusters in one place.

Security and Configuration Management

If authentication and authorization are problems at scale, security policies and network policies with multiple clusters are also problems. VMware Tanzu Mission Control will allow platform operators to manage cluster policies at a macrolevel by grouping many Kubernetes clusters together in workspaces divided by application team, the stages of software application development, or other ways that make sense to an organization. After the clusters are grouped, you will be able to apply security policies and configuration policies to all those clusters at once.

Backup and Restores with Velero

Velero allows operators to back up important data from Kubernetes clusters, including the underlying volumes. Through VMware Tanzu Mission Control, platform operators will be able to back up multiple important clusters at once as dictated by policy.
With the introduction of VMware Tanzu as a portfolio of products and services, we are transforming the way enterprises build software on Kubernetes. As organizations increase self-service and developer velocity through multiple Kubernetes clusters, VMware Tanzu Mission Control will offer a powerful set of capabilities that will allow platform operators to manage modern API-driven infrastructures.
To find out more about VMware Tanzu Mission Control, check out the web page or watch this video.

Thursday, April 16, 2015

NUMA and VMware


ESXi uses a sophisticated NUMA scheduler to dynamically balance processor load and memory locality or processor load balance.


Each virtual machine managed by the NUMA scheduler is assigned a home node. A home node is one of the system’s NUMA nodes containing processors and local memory, as indicated by the System Resource Allocation Table (SRAT).

When memory is allocated to a virtual machine, the ESXi host preferentially allocates it from the home node. The virtual CPUs of the virtual machine are constrained to run on the home node to maximize memory locality.

The NUMA scheduler can dynamically change a virtual machine's home node to respond to changes in system load. The scheduler might migrate a virtual machine to a new home node to reduce processor load imbalance. Because this might cause more of its memory to be remote, the scheduler might migrate the virtual machine’s memory dynamically to its new home node to improve memory locality. The NUMA scheduler might also swap virtual machines between nodes when this improves overall memory locality.
Some virtual machines are not managed by the ESXi NUMA scheduler. For example, if you manually set the processor or memory affinity for a virtual machine, the NUMA scheduler might not be able to manage this virtual machine. Virtual machines that are not managed by the NUMA scheduler still run correctly. However, they don't benefit from ESXi NUMA optimizations.
The NUMA scheduling and memory placement policies in ESXi can manage all virtual machines transparently, so that administrators do not need to address the complexity of balancing virtual machines between nodes explicitly.
The optimizations work seamlessly regardless of the type of guest operating system. ESXi provides NUMA support even to virtual machines that do not support NUMA hardware, such as Windows NT 4.0. As a result, you can take advantage of new hardware even with legacy operating systems.
A virtual machine that has more virtual processors than the number of physical processor cores available on a single hardware node can be managed automatically. The NUMA scheduler accommodates such a virtual machine by having it span NUMA nodes. That is, it is split up as multiple NUMA clients, each of which is assigned to a node and then managed by the scheduler as a normal, non-spanning client. This can improve the performance of certain memory-intensive workloads with high locality. For information on configuring the behavior of this feature, see Advanced Virtual Machine Attributes.
ESXi 5.0 and later includes support for exposing virtual NUMA topology to guest operating systems. For more information about virtual NUMA control, see Using Virtual NUMA.

More on this in later posts.

Wednesday, April 15, 2015

Calculating Average Guest Latency in VMware

If you are using VMware vSphere, VMware ESXi cannot see application latency because it is above the ESXi stack. What ESXi can do is detect three types of latency that are also reported back into esxtop and VMware vCenter.
Average guest latency (GAVG) has two major components: average disk latency (DAVG) and average kernel latency (KAVG).
DAVG is the measure of time that I/O commands spend in the device, from the driver host bus adapter (HBA) to the back-end storage array.
KAVG is how much time I/O spends in the ESXi kernel. Time is measured in milliseconds. KAVG is a derived metric, which means that there is no specific calculation for it. To derive KAVG, subtract DAVG from GAVG.
 
In addition, the VMkernel processes I/O very efficiently, so there should be no significant wait in the kernel, or KAVG. In a well-configured, well-running VDI environment, KAVG should be equal to zero. If KAVG is not equal to zero, then the I/O might be stuck in a kernel queue inside the VMkernel. When that is the case, time in the kernel queue is measured as the QAVG.
 
To get a sense of the latency that the application can see in the guest OS, use a tool such as Perfmon to compare the GAVG and the actual latency the application is seeing. 
 
This comparison reveals how much latency the guest OS is adding to the storage stack. For instance, if ESXi is reporting GAVG of 10 ms, but the application or Perfmon in the guest OS is reporting storage latency of 30 ms, then 20 ms of latency is somehow building up in the guest OS layer, and you should focus your debugging on the guest OS storage configuration

Friday, May 9, 2014

Virtualize with Confidence - Use VMware: Storage DRS and Drmdisk

Virtualize with Confidence - Use VMware: Storage DRS and Drmdisk: Storage DRS and Drmdisk Storage DRS leverages a special kind of disks for facilitating a more granular control over initial placement an...

Storage DRS and Drmdisk

Storage DRS and Drmdisk

Storage DRS leverages a special kind of disks for facilitating a more granular control over initial placement and migration recommendations. It also plays a major role in I/O load balancing by using these deeper details.

Let us read about it in detail:

DrmDisk


vSphere Storage DRS uses the DrmDisk construct as the smallest entity it can migrate. A DrmDisk represents a consumer of datastore resources. This means that vSphere Storage DRS creates a DrmDisk for each VMDK file belonging to the virtual machine. A soft DrmDisk is created for the working directory containing the configuration files such as the .VMX file and the swap file.

  • A separate DrmDisk for each VMDK file
  • A soft DrmDisk for system files (VMX, swap, logs, and so on)
  • If a snapshot is created, both the VMDK file and the snapshot are contained in a single DrmDisk.



VMDK Anti-affinity Rule

When the datastore cluster or the virtual machine is configured with a VMDK-level anti-affinity rule, vSphere Storage DRS must keep the DrmDisk containing the virtual machine disk files on separate datastores.

Impact of VMDK Anti-Affinity Rule on Initial Placement

Initial placement immensely benefits from this increased granularity. Instead of searching a suitable datastore that can fit the virtual machine as a whole, vSphere Storage DRS can seek appropriate datastores for each DrmDisk file separately. Due to the increased granularity, datastore cluster fragmentation—described in the “Initial Placement” section—is less likely to occur; if prerequisite migrations are required, far fewer are expected.

Impact of VMDK Anti-Affinity Rule on Load Balancing

Similar to initial placement, I/O load balancing also benefits from the deeper level of detail. vSphere Storage DRS can find a better fit for each workload generated by each VMDK file. vSphere Storage DRS analyzes the workload and generates a workload model for each DrmDisk. It then determines in which datastore it must place the DrmDisk to keep the load balanced within the datastore cluster while offering sufficient performance for each DrmDisk. This becomes considerably more difficult when vSphere Storage DRS must keep all the VMDK files together. Usually in that scenario, the datastore chosen is the one that provides the best performance for the most demanding workload and is able to store all the VMDK files
and system files.

By enabling vSphere Storage DRS to load-balance on a more granular level, each DrmDisk of a virtual machine is placed to suit its I/O latency needs as well as its space requirements.

Virtual Machine–to–Virtual Machine Anti-Affinity Rule

An inter–virtual machine (virtual machine–to–virtual machine) anti-affinity rule forces vSphere Storage DRS to keep the virtual machines on separate datastores. This rule effectively extends the availability requirements from hosts to datastores. In vSphere DRS, an anti-affinity rule is created to force two virtual machines.

For example,

Microsoft Active Directory servers—to run on separate hosts; in vSphere Storage DRS, a virtual machine–to–virtual machine anti-affinity rule guarantees that the Active Directory virtual machines are not stored on the same datastore. There is a requirement that both virtual machines participating in the virtual machine–to–virtual machine anti-affinity rule be configured with an intra–virtual machine VMDK affinity rule.

Thoughts ?

vSphere Storage DRS is one cool piece of code and is continuously improving how we use traditional storage systems for better.

Always good to dig and understand the VMware vSphere features in detail.  Until next time :)