Application Connectivity troubleshooting
Problem
Reporting and acting on network-related issues requires accurate data collection, visualization and interpretation. Without reliable network performance indicators, fixing connection issues becomes a guessing game. Ultimately, leading to poor employee experience and resource wasting.
Solution
The Application Connectivity troubleshooting follows a set of investigation principles. Application Connectivity helps to:
Identify the root causes behind network-related issues or exclude possible root causes.
Effectively troubleshoot network-related issues with targeted solutions.
Stop the “blame game” by enabling a fact-based discussion between involved teams.
Control device data privacy and ensure compliance within your organization.
To achieve this, the Application Connectivity framework relies on connections data with connection metrics, destination information, and data privacy compliances at an application-device level.
The NQL queries on this page are examples how to use connections data to investigate network-related issues. Similar queries are supported by different query-based features available in the Nexthink web interface.
You can also use Network view for connections data visualization, filtering and drill-downs (transport protocols, devices, binaries, destinations, etc.).
Prerequisites
Connections events are only available for devices with Collectors that report 'Infinity only'.
Minimum Collector version of 2023.10.
Connections data
The connections data used by Network view and the NQL queries on this page include:
Connection events
Connection metrics
Destination decorations
Connection event aggregations
Jump to the Network troubleshooting with Connections Data section to learn about the use of Application Connectivity queries.
Connection events
A connection event represents an outgoing TCP connection (established by a device) or outgoing UDP packages. Each connection event provides the following information:
Start time, end time, and duration of the event’s bucket
The source of the connection event
The destination of the event
The transport protocol and IP version
Metrics about the connection
Connection events are sampled events, meaning Nexthink reports connection events in buckets of 15 minutes and 1 day.
The namespace connection of the NQL data model contains one main table:
The
connection.events
table contains events for outgoing TCP connections and UDP packages.
The following two tables in the connection namespace are deprecated and will be removed in the future:
The
connection.tcp_events
table contains events for outgoing TCP connections.The
connection.udp_events
table contains events for outgoing UDP packages.
Some metrics, like the number of failed connections or the connection establishment time, are only available for TCP connection events.
Refer to the NQL data model documentation for more information.
Connection events association
Connection events are linked to the following objects:
The device that establishes the connection.
The binary that uses the connection.
The user of the process that runs the binary.
Optionally, the desktop application configured for this binary.
Optionally, the network application matching the configured destinations.
Connection destination decoration
Nexthink decorates connection events data with additional destination information.
Connection event metrics
Connection events provide the following metrics:
The connection round-trip-time (RTT): The average round trip time for all established connections. The round trip time is measured between sending the SYN message and receiving the SYN-ACK message from the remote party during the TCP connection establishment, a 3-way handshake. This metric is only available for TCP connections with at least one established connection.
Incoming and outgoing traffic in bytes. Data received (TCP only) and sent (TCP and UDP) by the application during the event.
The ratio of all failed TCP connections over all attempted TCP connections i.e., all established and failed TCP connections.
Number of connections per status in the event.
Connection events aggregation
Nexthink aggregates connections into buckets of 15 minutes and 1 day.
Network troubleshooting with Connections data
Use connections data to troubleshoot network-related issues. To find the root cause of a network issue or exclude possible root causes, you must identify the relevant population (devices, apps, destinations) affected by network issues and when the impacted connection metric (failed connection ratio, establishment time, traffic) changed.
You can apply the same troubleshooting principle with Network view. Network View allows users to visually identify the relevant population by selecting a connection metric and filtering in the device, application, and destination dimension.
Identifying the relevant population
Focus on the three dimensions of the connections data:
Device Dimension: Which and how many are the impacted devices?
Determine the impacted devices sharing the same characteristics and location.
Application Dimension: Which and how many are the impacted desktop applications?
Destination Dimension: Which are the impacted destinations? Where are they located?
Device Dimension
Connections events are linked to the device object that created the network connection. This allows you to investigate the connections data of a single device (by devices.name
) or a group of devices, for example by devices.entity
, GeoIP-based location or other custom organizational unit classifications.
Refer to the Product configuration documentation for more information.
To group devices by GeoIP-based location, use the location context of the connection event, for example:
connection.events during past 7d
| where transport_protocol == TCP
| where binary.name == "*outlook*"
| summarize Avg_RTT = establishment_time.avg() by context.location.country
The location context is where the device was at the time of the event. It requires an activated geolocation feature and works best when the collector traffic is routed to the Internet directly and not through a VPN.
Alternatively, you can use the organizational context, for example:
devices during past 7d
| with connection.events during past 7d
| where transport_protocol == tcp
| where context.organization.entity == "XYZ"
| compute Failed_Connections = number_of_failed_connections.sum()
Application Dimension
Connection events are linked to the binary object that initiated the connection.
connection.events during past 7d
| where transport_protocol == TCP
| where binary.name == "ABC"
| summarize Avg_RTT = establishment_time.avg() by context.location.country
Additionally, the connection event is linked to a desktop application if the binary is part of the application definition.
connection.events during past 7d
| where transport_protocol == TCP
| where application.name == "XYZ"
| summarize Avg_RTT = establishment_time.avg() by context.location.country
Destination Dimension
The destination is a structured field of the connection event. For example:
connection.events during past 7d
| where transport_protocol == TCP
| where destination.port == 135
| where number_of_failed_connections > 0
Note that it is impossible to summarize by IP address because the cardinality of IP addresses is too high. Instead, you can configure a Network Application based on IP address, IP subnet, network port, or domain name. Afterwards you can filter connections events using the Network Application name, for example:
connection.events during past 7d
| where transport_protocol == TCP
| where network_application.name == "XYZ"
| where number_of_failed_connections > 0
Refer to the Destination documentation for more information.
Investigating TCP Connections
The two main metrics to gain visibility on the quality of TCP connections are:
The connections round-trip-time (RTT): The connections RTT is available for all TCP events with established connections and can be accessed through
tcp_events.establishment_time.avg
. Connections RTT is a good indicator for slow connections.The failed connections ratio: The number of failed connections over the number of new connections (established and failed):
CODEFailed_Connections_Ratio = number_of_failed_connections.sum() / (number_of_established_connections.sum() + number_of_failed_connections.sum())
To better understand how the failed connections ratio is computed, look at the following image that shows connections on a timeline. Each line represents a specific connection and illustrates its duration. For example:
C1 starts before the selected timeframe and ends within it.
C2 starts before the selected timeframe and ends after it.
C5 attempts to connect in the selected timeframe, but fails.
The table below describes which connections are taken into consideration to calculate the number of connection per its status within the specified timeframe.
Metric | Metric value | Connection included |
---|---|---|
Total number of connections | 5 | C1, C2, C3, C4, C5 |
Failed connections | 1 | C5 |
Established connections | 2 | C3, C4 |
Alive connections | 2 | C1, C2 |
Successful connections | 4 | C1, C2, C3, C4 |
Attempted connections | 3 | C3, C4, C5 |
Failed connections ratio | 33% | C5 / [C3, C4, C5] |
Failed connections ratio and its value fluctuation should always be evaluated along with the number of failed connections or the number of attempted connections. Consider the following example: if the ratio of failed connections is 100%, but the number of attempted connections equals 1, it’s not worthwhile to look into further.
Example: Investigation of VPN connectivity issues
Find an example below of a live dashboard to investigate VPN connectivity issues. Notice that the application dimension is fixed to the VPN binaries.
Find below the NQL queries from the example of investigating VPN connectivity issues:
connection.tcp_events during past 24h
| where binary.name in ["VPN_binary_Windows", "VPN_binary_macOS"]
| where destination.domain == "VPN_edge_domain_name"
| where number_of_established_connections > 0
| summarize Devices__ = device.name.count(), Avg_RTT = establishment_time.avg(), Failed_Connections_Ratio_in_percent = (number_of_failed_connections.sum()) / ((number_of_established_connections.sum()) + (number_of_failed_connections.sum())) * 100, Failed_Connections = number_of_failed_connections.sum() by context.location.country, destination.country, destination.datacenter_region
| sort Failed_Connections desc
connection.tcp_events during past 720min
| where binary.name in ["VPN_binary_Windows", "VPN_binary_macOS"]
| where destination.domain == "VPN_edge_domain_name"
| where number_of_established_connections > 0
| summarize average_RTT = establishment_time.avg() by 15min
Investigating UDP Traffic
Because of the connectionless nature of UDP, investigating UDP network traffic is limited compared to TCP network traffic. Your main tool is to look for changes and differences in the amount of outgoing UDP traffic, for example, comparing the average traffic per device for one application.
connection.events during past 7d
| where transport_protocol == UDP
| where number_of_successful_connections > 0
| where outgoing_traffic == 0
| summarize idle_connections = number_of_successful_connections.sum() by binary.name
| sort idle_connections desc
You can apply the same troubleshooting approach to Network view for connections data visualization, filtering and drill-downs (including transport protocols).
Application Connectivity in Nexthink Infinity
Find below the related documentation of some of the Nexthink features compatible with Application Connectivity’s connections data and queries:
Investigations using the NQL editor
Connections Timeline in the Device View
System Monitors for Alerts:
Binary connection establishment time increase
Binary failed connection ratio increase
Network view enabled for Network and Desktop Applications, Investigations, and Device view.
You can use the Application Connectivity queries included on this page for all NQL-based features in the Nexthink web interface.
Overseeing data privacy
Refer to the Configuring Collector level anonymization documentation to anonymize, filter and control connections data privacy using NQL commands and sample queries.
Implementation Aspects
The following implementation aspects impact connection events:
RELATED TOPICS
RELATED TRAINING