Getting started with Alerts

Alerts are critical enablers in the proactive journey of IT support teams. They allow teams to detect issues and help them prioritize their efforts to improve the digital employee experience (DEX).

How can alerts help you detect and diagnose issues?

Nexthink Alerts notifies you about issues that require swift action by filtering the noise so you can identify situations that require actual user intervention.

Use alerts to identify situations where something has unexpectedly changed or occurred.

Detecting global issues impacting your environment

Detect issues Nexthink identifies based on the cross-organization statistics that impact your environment with Alerts cloud insights:

  • Learn about binary reliability and performance, and detect anomalies such as abnormal CPU usage.

  • Quickly identify impacted binary versions and find the recommended version.

Refer to the Alerts overview to learn how to monitor and use alerts for diagnostic purposes.

Detecting issues impacting single or multiple devices/users

Proactively monitor issues according to your needs, whether at the device level, focusing on a single user or device, or addressing widespread incidents affecting multiple devices or sudden performance degradation.

Detect the number of devices or users with issues

Detect whether a certain number of devices or users experienced an issue.

For example, the system triggers an alert when more than 20 devices had a boot time of over 60 seconds, within the past 24 hours.

Detect frequent issues across devices

Monitor the values of any metric across multiple devices to detect whether an aggregated metric value has breached the defined threshold, or shifts by a specific percentage.

For example, the system triggers an alert when the number of crashes of any binary increases by 100% in relation to a predefined norm, like the average of a metric value over the last 7 days.

Refer to the Detecting issues impacting multiple devices for more information.

Detect a specific device or user with issues

Monitor issues on a single device or for a specific user to subsequently trigger alerts if applicable. Send separate notifications for each device or user.

For example, the system triggers an alert for each device that had at least 2 system crashes within the last 24 hours and creates a ticket in the ITSM software on behalf of the user.

Refer to the Detecting issues impacting a single device or user documentation for more information.

Alerts also support detecting issues impacting virtual desktop infrastructures (VDI).

Explore the Alerts FAQ to learn how to investigate devices associated with an existing alert, using NQL queries.


What is the difference between an alert and a monitor?

Refer to the Managing Alerts documentation to learn more about monitors, monitor types and monitor creation.

An alert is a special type of event triggered when specific conditions are met for the performance metrics of different features, such as system crashes, load times, or failed connections.

  • Triggered alerts are visualized in the timeline on the Alerts overview page.

  • Triggered alerts—if configured—activate emails or webhook notifications to communicate issues within your organization.

A monitor is a component of the Alerts and Diagnostics module that you configure to evaluate metrics against defined conditions and trigger alerts to identify specific issues.

  • Monitors can be custom-created or built-in (system monitors or installed from Nexthink Library).

  • Monitors enable anomaly detection capabilities for IT environments and allow you to notify users.


When to use alerts and when to use data exporters?

Use alerts to detect issues requiring immediate assistance or action. For other reports or events that do not need swift action, such as Report all devices with low disk space, use a data exporter.

Also, use data exporters to report on a large number of objects that meet specific condition criteria expressed with an NQL query, or if you expect a single monitor to trigger more than 500 alerts at the same time.

Additionally, use the data export scheduling option to export data regularly.


What types of alert detection modes are available?

Nexthink alerts detect critical issues based on the following detection modes:

  • Metric threshold triggers an alert when the value of one or more metrics reaches a user-defined threshold.

  • Metric change triggers when the current metric value differs from the rolling 7-day global average beyond the configured threshold.

  • Metric seasonal change triggers an alert when the current metric value falls outside its time-of-day baseline from the last seven days, based on the configured standard deviation band.

  • Global detection—only available for built-in monitors—triggers an alert when a specified number of devices use a particular binary version or binary configuration that performs worse than other versions or configurations across organizations.

You can configure up to 5 custom monitors in total using either Metric change or Metric seasonal change.

You can create up to 50 monitors.

Baseline computation depending on the alert detection mode

Baseline computation depends on the alert detection mode:

  • For Metric change: The baseline is the rolling 7-day global average (mean) of all data points returned by the monitor’s query. The current value is evaluated against this average using your configured threshold.

  • For Metric seasonal change: The baseline is time-of-day aware, meaning that for each time slot (for example, Monday 10:00–11:00) Nexthink computes the mean from the same slot across the last 7 days and evaluates divergences using standard-deviation bands:

    • slightly is one standard deviation (±1σ)—68% of values are within ±1σ.

    • moderately is two times the standard deviation (±2σ)—95% of values are within ±2σ.

    • highly is three times the standard deviation (±3σ)—99.7% of values are within ±3σ.

Visualize the computed baseline in the Diagnostics dashboard timeline.

Scheduling frequency for baseline data points

The monitor Scheduling frequency defines how many data points feed the baseline, coherent with the detection type:

  • For Metric change, the system uses a few data points (about 20 points) to compute the baseline, meaning the monitor could trigger an alert within 7 days after activation.

  • For Metric seasonal change, the system waits 7 days before it can trigger an alert. This ensures that at least 7 data points—one for each day, for the same timeslot—are available for standard deviation estimations.

Subsequently, the scheduling frequencies change according to the detection type:

  • For Metric change, the scheduling frequency ranges from 15 minutes to several days.

  • For Metric seasonal change, the maximum scheduling frequency is 24 hours, as more than 1-day frequency does not contribute to computing the mean for the same slot across the last 7 days.

    • The 12 hours schedules yield two daily slots; each slot still uses 7 historical points (one per day) for its baseline.

For sub-hour scheduling frequencies, all data points within the hour are included. For example, a 15-minute schedule collects 4 points per hour, which is 28 points per slot over 7 days.


How does the system trigger and close an alert?

Nexthink monitors trigger alerts using one of the following methods:

  • The Schedule trigger method—available for custom and built-in monitors—is used for periodic checks. The monitor evaluates the metric(s) in regular intervals defined by the configured schedule frequency.

  • The Events trigger method—restricted to built-in monitors—is used for real-time monitoring and instant issue detection. Depending on the configured monitor Query and conditions, the monitor evaluates how long a threshold must be breached to trigger an alert.

Regardless of the trigger method, the monitor determines whether to open a new alert, keep the current alert Open, or close it.

An alert stays open until metric values stabilize and a subsequent evaluation closes it.

Handling excessively triggered alerts

Nexthink limits the total number of objects that can trigger the same alert to 500, keeping individual alerts relevant.

When a single monitor triggers 500 or more alerts concurrently, Nexthink activates an alert-grouping mechanism to prevent system flooding. This means the monitor stops generating additional alerts until the situation is resolved—therefore, grouped alerts act as one when handled.

This alert grouping behavior is captured in the NQL data model through the is_grouped field..

Closing alerts

The system closes the alert when a monitored metric no longer breaches defined conditions.

  • If the monitor tracks metric threshold, the system closes the alert when a monitored metric no longer breaches the threshold.

  • If the monitor tracks metric change, the system closes the alert when that metric value drops down to the baseline.

If the monitor query does not return any data during evaluation, the alert automatically closes according to the following rules:

  • For alerts that track aggregated metrics across multiple devices, the alert closes if there are three consecutive days of no data returned.

  • For alerts triggered for a single device or user, the alert closes if the monitor query continuously returns no data during the period specified in the during past parameter of the query.

The system sends a notification—if configured—when specific alerts are triggered or closed.

Sending alert notifications

If the alert was triggered in a previous evaluation and already has an Open status, the system does not send a notification if the metric still meets the detection criteria in the current evaluation.

Refer to the Responding to Alerts documentation to learn how to react and respond to alert notifications.

Trigger2.png

What built-in monitors are available in Nexthink?

Instead of creating a monitor from scratch, Nexthink recommends customizing built-in monitors, as they offer a wider range of features compared to custom monitors.

Built-in monitors can be system monitors or monitors installed from Nexthink Library.

Library monitors

Nexthink offers a set of library monitors that you can manually install from the Nexthink Library. These preconfigured monitors track issues for specific use cases and solutions.

Go to the Nexthink Library module within your Nexthink instance to see all available library monitors. Nexthink Library offers built-in monitors for virtual desktop infrastructure—VDI.

Library monitors for virtual desktop infrastructure—VDI

Nexthink Library offers built-in VDI-specific monitors to track real-time performance metrics and user experience in virtual desktop infrastructures.

Find below some of the use cases that built-in library VDI monitors allow you to address:

  • Detect network congestion causing latency spikes in specific office locations.

  • Identify CPU bottlenecks affecting virtual desktops.

  • Prevent overloading of a desktop pool to maintain optimal user experience.

  • Identify network instability affecting session continuity.

System monitors

System monitors are not included when calculating the maximum number of monitors permitted by your license.

Explore system monitors directly within your Nexthink Library module.

System monitors continuously track your IT environment for the most common issues, like performance degradations of web applications, binaries, desktop applications, and devices.

You can use system monitors with the default settings or customize them as needed—with limitations.

The following system monitors are installed and activated by default:

Binary performance
  • Binary connection establishment time increase: Keeps track of the average connection establishment time per binary in the last hour, for each binary present in the environment. Only available for Nexthink instances fully transitioned to Infinity.

  • Binary crashes - High percentage of devices impacted: Keeps track of the percentage of devices with execution crashes per binary in the last 24 hours, for each binary present in the environment.

  • Binary crashes increase: Keeps track of the number of crashes per binary in the last 24 hours, for each binary present in the environment.

  • Binary failed connection ratio increase: Keeps track of the percentage of failed connections per binary in the last hour, for each binary present in the environment.

  • Binary freezes - High percentage of devices with freezes: Keeps track of the percentage of devices with freezes per binary in the last hour, for each binary present in the environment.

  • Binary memory - Average memory usage increase: Keeps track of the average memory usage per binary in the last six hours, for each binary present in the environment.

Binary - global anomalies
  • Binary CPU usage - global anomaly: Detects anomalies in CPU usage across versions or configurations of binaries, based on anonymized data from all companies using Nexthink.

  • Binary crashes - global anomaly: Detects anomalies in crashes reliability across versions or configurations of binaries, based on anonymized data from all companies using Nexthink.

  • Binary freezes - global anomaly: Detects anomalies in freezes frequency across versions or configurations of binaries, based on anonymized data from all companies using Nexthink.

  • Binary memory usage - global anomaly: Detects anomalies in memory usage across versions or configurations of binaries, based on anonymized data from all companies using Nexthink.

Binary - lagging performance
  • Binary CPU usage - lagging performance: Detects when the CPU usage of a specific binary in your organization is higher than that of other companies using the same binary. Nexthink benchmarks CPU usage with anonymized data from all companies using Nexthink.

  • Binary crashes - lagging performance: Detects when the crashes frequency of a specific binary in your organization is higher than that of other companies using the same binary. Nexthink benchmarks binary crashes with anonymized data from all companies using Nexthink.

  • Binary freezes - lagging performance: Detects when the freezes frequency of a specific binary in your organization is higher than that of other companies using the same binary. Nexthink benchmarks binary freezes with anonymized data from all companies using Nexthink.

  • Binary memory usage - lagging performance: Detects when the memory usage of a specific binary in your organization is higher than that of other companies using the same binary. Nexthink benchmarks memory usage with anonymized data from all companies using Nexthink.

Device performance decline
  • Boot duration increase: Keeps track of the average device boot duration.

  • Logon duration increase: Keeps track of the average device logon duration.

  • System crashes increase: Keeps track of the number of devices with system crashes per OS platform in the last day, for each OS platform present in the environment.

Device performance and connectivity poor ratings

Detect issues based on the poor thresholds that Nexthink Administrators can configure for endpoint-related performance metrics of the Digital Employee Experience score. Refer to Ratings management for more information.

  • Boot speed: Detects devices with poor Boot duration ratings.

  • Logon speed: Detects devices with poor Login time ratings.

  • CPU usage: Detects devices with frequent (30% of the time) poor CPU usage ratings.

  • CPU interrupt usage: Detects devices with frequent (30% of the time) poor CPU interrupt ratings.

  • Disk queue length: Detects devices with frequent (30% of the time) poor Disk queue length ratings.

  • System free space: Detects devices with poor System drive free space rating.

  • WiFi strength: Detects devices with frequent (30% of the time) poor WiFi signal strength ratings.

  • GPU 1 / GPU 2 usage: Detects devices with frequent (30% of the time) poor GPU1 or GPU2 usage rating.

  • Virtual session lag: Detects devices with poor average session network latency.

  • WiFi upload speed: Detects devices with frequent (30% of the time) poor WiFi transmission rate ratings.

  • WiFi download speed: Detects devices with frequent (30% of the time) poor WiFi receive rate ratings.

Web applications
  • Web applications errors increase: Keeps track of the increase of the number of pages with errors per web application in the last hour, for each web application defined in the Applications module.

  • Web applications slow page loads increase: Keeps track of the average page load time per web application in the last hour, for each web application defined in the Applications module.

  • Web applications slow transactions increase: Keeps track of the average transaction duration per web application in the last hour, for each web application defined in the Applications module.

  • Web applications - resource errors increase: Keeps track of the number of resource errors during past 12h, for each web application configured in your environment.

  • Web applications - percentage of incomplete transactions increase: Keeps track of the percentage of incomplete transactions during past 12h, for each transaction configured as part of web application configuration in your environment.

  • Web applications - percentage of frustrating transactions per transaction increase: Keeps track of the percentage of frustrating transactions during past 12h, for each transaction configured as part of web application configuration in your environment.

  • Web applications - percentage of frustrating page loads per key page increase: Keeps track of the percentage of frustrating page loads during past 12h, for each key page and web application configured in your environment.


Granting permissions for Alerts

Refer to the Roles documentation for a detailed description of Permissions, View domain options and Data privacy granularity settings.

To enable proper permissions for Alerts as an administrator:

  1. Select Administration > Roles from the main navigation panel.

  2. Create a New Role or edit an existing role by hovering over it.

  3. In the Permissions section, scroll down to the Alerts section to enable appropriate permissions for the role.

View domain impact on Alerts permissions

The table below shows what users with full and limited view domain access can do, assuming the necessary permissions are enabled.

Permission
Full access
Limited access

Manage all alerts

View all alert dashboards


RELATED TOPIC

Last updated

Was this helpful?