Synthetic Transaction Monitoring

Background

Wikipedia defines synthetic monitoring as:

(…) a monitoring technique that is done by using a simulation or scripted recordings of transactions. Behavioral scripts (or paths) are created to simulate an action or path that a customer or end-user would take on a site, application, or other software (or even hardware). Those paths are then continuously monitored at specified intervals for performance, such as functionality, availability, and response time measures.

Synthetic transaction monitoring (STM) continuously monitors applications using scripts or so-called “robots.”

STM is useful when you need to:

  • Monitor the availability of an application and stay alert in case of an issue or outage, even when no one is using the application.

  • Define performance baselines.

  • Monitor the whole application (main domain), specific pages, or transactions.

Overview

Problem

Business applications and their underpinning technology stack can break due to application, network, or infrastructure changes – with the resulting outage being noticed by employees only when they try to use them. This can rapidly lead to an increase in incidents and employee productivity loss.

Solution

Enable the periodic execution of scripts that emulate what real employees would do (but aren't yet) to test the entire technology and application stack before employee usage.

Thanks to dedicate alerts, this allows you to stay genuinely proactive by identifying and remediating critical issues before a ticket can occur or employees can be impacted.

Key Features:

  • Proactive monitoring and alerting on the availability and performance of an application, as well as any/all elements of the entire application domain and its underlying technology stack

  • Dashboards that can compare actual/real user experience with synthetics to visually analyze and report behavior over time

Configuration

The endpoints

Select one or several endpoints on which you want to run the synthetic scripts.

Dedicated endpoints

They should be dedicated to only running these synthetic scripts to avoid impacting other system users. This also allows you to avoid false positives, for instance, a spike of slowness due to other processes running on a shared endpoint.

Representative

They should be selected according to what you want to measure. It is generally best to use devices representative of what your employees would use, for example, in terms of OS, CPU, or memory.

Geography

Their location also depends on what you want to measure. Suppose you want to focus on users in a specific location because they often complain about slowness, for instance. If the data center or gateway they connect to is far away, select a device located there.

The endpoint must have Nexthink Collector and browser extensions installed to collect the data.

The scripts

The scripts must be written and customized according to the use case.

Caution: This Remote Action invokes the execution of URLs through Microsoft Edge based on URLs set in the input parameter; after 30 seconds, Microsoft Edge will be closed. Please carefully consider which devices you decide to deploy this remote action to avoid disrupting employees.

Considerations:

  • If you're familiar with these tools, you can use Selenium and Chromium for better control and automation capabilities. In the case of Chromium, you need to load and enable the Nexthink browser extension each time the browser starts.

  • You can select the frequency of execution based on your use cases.

  • If you need to log in to an application, ensure that two-factor authentication is disabled for the specific account you use for synthetic monitoring. You can also correctly authenticate the application once and verify that it does not require additional authentication.

Collecting the data

  1. Install the Nexthink Collector and extension.

  2. Factor page loads versus transactions.

Note: If your script runs actions that do not trigger a page load, ensure that you configure those as transactions in Application Experience. We recommend configuring key pages to analyze the results better.

Visualizing the data

We recommend building a Live Dashboard according to the use case you want to solve. For real-time alerting, compare results with real user data or other use cases.

Use the same NQL queries on the left and right side columns. For synthetic scripts, filter on endpoints where synthetic tests are run; for real user data, simply exclude them.

Example 1

Metric: Page load time

NQL query (synthetic):

Code
web.page_views
| where http://application.name  == "ABC"
| where device.name == "synthetic"
| summarize pageLoadTime = page_load_time.overall.avg()

NQL query (accurate user data):

Code
web.page_views
| where http://application.name  == "ABC"
| where device.name != "synthetic"
| summarize pageLoadTime = page_load_time.overall.avg()

Example 2

Metric: The number of page loads

NQL query (synthetic):

Code
web.page_views
| where http://application.name  == "ABC"
| where device.name == "synthetic"
| summarize numberOfPageLoads = number_of_page_views.sum()

NQL query (accurate user data):

Code
web.page_views
| where http://application.name  == "ABC"
| where device.name != "synthetic"
| summarize numberOfPageLoads = number_of_page_views.sum()

Example 3

Metric: Bar chart of transaction durations by transaction

NQL query (synthetic):

Code
web.transactions
| where http://application.name  == "ABC"
| where device.name == "synthetic"
| where status == COMPLETED
| summarize transaction_time = duration.avg() by transaction.name
| sort transaction_time desc

NQL query (accurate user data):

Code
web.transactions
| where http://application.name  == "ABC"
| where device.name != "synthetic"
| where status == COMPLETED
| summarize transaction_time = duration.avg() by transaction.name
| sort transaction_time desc

Alerting

Alerting is typically the most important use case for synthetic monitoring.

Example 1

Purpose: Alert on high error ratio

Frequency: 15 min

NQL query:

Code
application.applications
 | with web.page_views during past 1h
 | where http://application.name  == "ABC"
 | where device.name == "synthetic"
 | where is_soft_navigation == false
 | compute total_number_of_page_views = number_of_page_views.sum()

 | with web.errors during past 1h
 | compute error_n = error.number_of_errors.sum()
 | where total_number_of_page_views> 10
 | where error_n> 10
 | summarize total_errors = error_n.sum(), total_navigations =  total_number_of_page_views.sum(),
   percentage_error_ratio = error_n.sum() * 100 / total_number_of_page_views.sum() by
   http://application.name

Example 2

Purpose: Alert on slow page load time

Frequency: 15 min

NQL query:

Code
application.applications
| with web.page_views during past 1h
| where http://application.name  == "ABC"
| where device.name == "synthetic"
| where experience_level == frustrating
| compute number_of_slow_pages_count = count()
| with web.page_views during past 1h
| compute number_of_pages_count = count()
| where number_of_pages_count > 10
| summarize number_of_slow_pages = number_of_slow_pages_count.sum(), number_of_pages = number_of_pages_count.sum(),ratio_of_slow_pages = number_of_slow_pages_count.sum() *100 / number_of_pages_count.sum() by http://application.name 
| where ratio_of_slow_pages > 1

Last updated

#451: 2024.8-Overview of integration DOC

Change request updated