Desktop Virtualization Optimization
Overview
Requirements
This live dashboard will have one or more of our virtualization connectors to be installed.
Problem
Virtual desktops are supported by a complex technology stack with an overwhelming number of unique performance and availability metrics. In addition, the traditional metrics used to monitor physical desktops may need to be interpreted differently in the context of a virtual desktop.
To make matters worse, organizations typically have multiple virtual desktop environments with different technology stacks which are managed by geographically dispersed expert teams.
These circumstances lead to several problems:
Data overload - the number of metrics creates a sense of data overload, making it harder to
identify those metrics that impact user experience,
interpret well-known metrics in the context of a virtual desktop,
understand where to start any troubleshooting journey and,
Proactively prevent performance issues.
Communication - communication about issues, their impact, and trends between all stakeholders is difficult due to the inherent complexity of desktop virtualization.
Localization - having multiple desktop virtualization environments with different technology stacks and users connecting from multiple geographically dispersed locations makes it difficult to localize issues and engage with the right expert team.
Impact assessment - it is difficult to assess the impact of issues without having a high-level overview of all virtual desktops and how they are organized.
Adoption and utilization – Virtual desktops are provided as a service from a data center. As with any service, there are periods of high and low demand. This means that performance and availability greatly depend on the behavior of its users. Some of these usage patterns – like logon storms - are difficult to spot and correlate.
Solution
This Live dashboard provides a dashboard with a unified overview of the status, health, and performance of all virtual desktops, regardless of their environment or technology stack. The dashboard is based on key performance indicators and provides users with the ability to spot trends, localize issues and assess their impact. Using tooltips, the dashboard helps organizations interpret the performance and availability indicators and suggest the next steps. Lastly, the dashboard aims to streamline communication between different stakeholders in managing digital experience on virtual desktops.
User experience overview
The performance of a virtual desktop is the most important factor in user experience. Traditional performance counters do not always paint an accurate picture. Because multiple users are sharing the same hardware resources, it is quite common that those performance metrics show an increased load while the users do not notice any slowdown themselves and vice versa.
The User Experience tab of this Library Dashboard focuses on a few metrics that will demonstrate how users perceive the performance of their virtual desktops. This perceived performance is broken down into three separate metrics describing the time it takes to log on, the network latency between the virtual desktop and the user’s local device, and lastly, the responsiveness of the session.
User experience summary
The first section on the User experience tab provides a summary of the total number of users and their experience based on the three metrics mentioned in the introduction of this overview. For each of the metrics, the average and maximum values are shown. The tooltips explain how to interpret these values and what the potential next steps are.
Logon performance breakdown
The logon process is very resource intensive and is one of the first indicators of congestion on the infrastructure. However, changes in the configuration of – for instance – group policies may also result in a significant slowdown in performance.
The dashboard provides insights into where and when slowdowns occurred and how many users were affected.
Session network latency breakdown
Users connect to a virtual desktop using a remoting protocol. This protocol is sensitive to network latency. In high latency situations, users may perceive their virtual desktops as slow, or their applications may appear blurry and distorted. For further troubleshooting, it is essential to determine if these slowdowns are caused by the network or by the virtualization infrastructure itself.
The dashboard provides insights into where and when network latency occurred and how many users were affected by them.
Session responsiveness breakdown
A virtual desktop starts to ‘feel’ slow to an end user when applications do not respond to user input. This perceived slowness can be caused by network latency or by the virtual machine being too busy to respond to the input.
Session responsiveness always needs to be compared to session network latency. If session responsiveness is high and latency is low, it is almost certain that the virtualization infrastructure is too busy. If responsiveness is high and network latency is too, then slowness is most likely caused by the network.
System Health
The system health page focuses on the primary health indicators CPU, memory, and storage. Traditional performance counters do not tell the full story with virtual machines because they share their hardware resources amongst multiple devices and users. For instance, high CPU usage does not automatically translate into lousy performance. Looking at the cost of virtualization infrastructure, it is desirable to maximize resource usage while minimizing the experience on user experience.
Summary
The summary of the system health page shows the number of users and virtual machines. In virtual environments, having too many users or devices is one of the leading causes of bad performance. It is important to correlate the number of users with the performance indicators because sometimes the solution to performance problems is to simply scale.
CPU/GPU usage
The CPU queue length of a device indicates if the device's processor can process all tasks in a timely manner. Especially in the context of desktop virtualization, where multiple users share the same device, and multiple devices share the same physical hardware resources, it is impossible to determine if a device is overloaded by looking at CPU usage alone. Also, to optimize cost CPU usage is kept high by design to optimize the cost of the virtualization infrastructure.
The balance between cost and performance is best monitored by looking at the CPU queue length. You can detect that the underlying infrastructure starts to struggle when the average CPU queue length of all virtual machines exceeds 2.
GPU becomes more important in any modern desktop as more applications start to rely on 3D acceleration. Unfortunately, most virtual desktops either do not have a GPU or have an underpowered GPU. In those cases, the CPU will over that responsibility which will lead to an increased load on the CPU.
Memory usage
Memory affects system performance in a slightly different way on virtual desktop infrastructures. On physical desktops, the rule of thumb is that the more memory you have, the better. Virtual desktops share the memory of their physical host machine. Adding too much memory to virtual machines will drain the host memory too quickly, which leads to lower user density per physical host. Allocating too little memory will lead to paging and excessive load on the storage system.
The amount of memory should be balanced. The average available memory graph and the memory swap rate graph will help find the right balance.
Storage load
The virtual hard drives of the virtual machines are typically stored on a remote storage system. These remote storage systems are often not designed for the characteristics of virtual desktop workloads. This mismatch and the fact that all reads and writes need to travel over a storage network connection result in additional latency, which physical desktops do not experience. The impact can be as big as moving from an SSD back to an HDD. And with many data center technologies, server-class storage is typically much more expensive than locally installed SSD drives.
Due to the complexity of the technology stack, it is hard to predict what impact storage will have on user experience. The disk queue length indicates if virtual machines are experiencing a storage bottleneck. Disk read/write latency can help understand in which direction a bottleneck occurs.
Current state
The current state provides an overview of the current usage of the virtual desktop environment. It allows the user to verify if anything is out of the ordinary quickly. For example, are there fewer hosts online than expected, or maybe many more users than expected?
Page filters
When used in conjunction with the KPI widgets, users can zoom in to quickly identify specific devices or users for further investigation.
Desktop pool name
A desktop pool represents a group of virtual devices with the same characteristics (e.g., technical specifications, installed applications, and the same group of target users). Sometimes, desktop pools even represent an administrative boundary meaning that a desktop pool has a dedicated group of support personnel when – for example - if it is deemed mission critical.
Being able to filter by the desktop pool allows organizations to identify which desktop pools need attention quickly.
Desktop pool type
We recognize three main desktop pool types: pooled, personal, and shared. Depending on the context, organizations may use different (confusing) terms for them. Some well-known alternative terms for each type are listed below:
Pooled VDI
Non-persistent VDI
Single session VDI
Personal VDI
Persistent VDI
Single session VDI
Shared VDI
Server Based Computing
Session Based Computing
Terminal Server
Multi-session VDI
NOTE: beware that these terms do not always fully align. For example, in some products, it is possible to define non-persistent personal VDI, but this is rarely done in practice.
Hypervisor name
The hypervisor property contains the name of the hardware virtualization product used to support the virtual desktops. These are often confusing, but it is essential to understand the difference. In most organizations, the desktop and hardware virtualization vendors are different. An organization might – for instance – run its Citrix Virtual Apps and Desktops product on top of a VMware vCenter infrastructure. In that case, the hypervisor name property would be “vCenter.”
Using this field, organizations with multiple hardware virtualization infrastructures can compare user experience. (e.g., On-premises VMware vCenter vs. Cloud-based Azure Hyper-V)
Remoting protocol
Users connect to virtual desktops using a so-called remoting protocol. Each vendor provides one or more of these protocols.
Last updated