Introduction

This document provides comprehensive information on the installation and configuration of the Nexthink Event Connector, as well as basic maintenance guidelines. This document provides a detailed description of the processes needed for a successful installation.

The information contained herein is subject to change without notice and is not warranted to be error-free. If you find any errors, please report them to us via the Nexthink support portal.

This document is intended for readers with a detailed understanding of Nexthink technology and Splunk and ServiceNow technologies, as well as some understanding of concepts such as REST messages, Linux command-line and basic security terms.

These configuration instructions should be executed by a Splunk, ServiceNow or Azure certified professional.

This software and related documentation are provided under a license agreement containing restrictions on use and disclosure and are protected by intellectual property laws. Except as expressly permitted in your license agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license, transmit, distribute, exhibit, perform, publish, or display any part, in any form, or by any means. Reverse engineering, disassembly, or decompilation of this software, unless required by law for interoperability, is prohibited.

Version: 1.4.0

Last Revision: 27/10/2021

Overview

Nexthink Event Connector makes it possible for Nexthink customers to integrate end-user IT data (failed connections, system crashes, etc) into Splunk, ServiceNow or Azure Data Lake Storage Gen2 platforms.

You can configure the Nexthink Event Connector service by filling the service configuration files with the data that will be sent to the target endpoint, as well as to Nexthink Engines. Minimum configuration actions may be required for the Splunk, ServiceNow or Azure Data Lake Storage Gen2 instances into which the data will be injected.

Main Components

The Nexthink Event Connector requires several components to be fully operative:

  • A running implementation of the Nexthink product. It needs at least one instance of Nexthink Engine V6.7 or later, to import data into the target endpoint via the Nexthink Web API V2.0.

  • A machine with CentOS 7 or later in which to install the connector service. Be aware that this machine must be able to reach both the Engine (Engines) where the data is retrieved and the target endpoint. This should not be one of the machines already in use, Nexthink Portal or Engine. This machine must have access to remote or local CentOS standard repositories, in case it is necessary to resolve any dependencies when installing Nexthink Event Connector service. A proxy with either None or Basic authentication is supported. For usage, please refer to the Installation and Configuration sections.

  • One of the following target endpoints:

    • Splunk Instance 6.5 or later conveniently configured to receive Nexthink data sent by the connector. In particular, the following items must be present:

      • HTTP Event Collector (HEC) standard Splunk application enabled and a previously generated token to communicate with it.

      • Nexthink Add-on for Splunk. Not strictly necessary, but highly recommended. It contains a set of data models to map the data sent by the connector service. These data models make it possible to manage the gathered information using powerful Splunk built-in techniques (Pivot, etc).

    • ServiceNow Instance London or later conveniently configured to receive Nexthink data sent by the connector. In particular, the following items must be present:

      • Event Management plugin installed and activated.

      • User with the role Event Management Integrator [evt_mgmt_integration].

    • Azure Data Lake Storage Gen2 instance configured and ready to receive data:

Target Endpoint configuration

Splunk Set-up

Enabling the Splunk HEC (HTTP Event Connector)

The following instructions are adapted from the official Splunk documentation at http://docs.splunk.com/Documentation/Splunk/latest/Data/UsetheHTTPEventCollector :

It is necessary to enable HTTP Event Collector (HEC) to receive events through HTTP before it can be used. For Splunk Enterprise, enable HEC through the Edit Global Settings dialog box, see the illustration below.

  • Click Settings > Data Inputs.

  • Click HTTP Event Collector.

  • Click Global Settings.

Editing global settings
  • In the All Tokens toggle button, select Enabled.

  • Select Default Source Type.

  • (Optional) To have HEC listen and communicate over HTTPS rather than HTTP, click the Enable SSL checkbox.

  • (Optional) Enter a number in the HTTP Port Number field for HEC to listen on. Confirm that there is no firewall blocking the port number specified in the HTTP Port Number field, either on the client-side or the Splunk instance that hosts HEC.

  • Click Save.

HEC Token

To use HEC, it is necessary to configure at least one token.

  • Click Settings > Data Inputs.

  • Click HTTP Event Collector.

  • Click New Token. 

Configuring a new token
  • In the Name field, enter a name for the token.

  • If it is necessary to enable indexer acknowledgment for this token, click the Enable indexer acknowledgment checkbox.

  • Click Next.

  • Confirm the source type and the index for HEC events.

  • Click Review.

  • Confirm that all settings for the endpoint are ok.

  • If all settings are ok, click Submit. Otherwise, click < to make changes.

  • Copy the token value that Splunk Web displays and paste it into another document for reference later.

Nexthink Add-on for Splunk

  • Click App: Search & Reporting > Find More Apps.

Add-on for Splunk
  • Type Nexthink Add-on for Splunk in the top-left search box.

  • Click Install to install the add-on.

ServiceNow

Event Management plugin

The following instructions are adapted from the official ServiceNow documentation.

Procedure:

  1. In the HI Service Portal, click Service Requests > Activate Plugin.

  2. Fill out the form.

Target instance

Instance on which to activate the plugin.

Plugin name

Name of the plugin to activate.

Specify the date and time you would like this plugin to be enabled

Date and time must be at least 2 business days from the current time.

Note: Plugins are activated in two batches each business day in the Pacific time zone, once in the morning and once in the evening. If the plugin must be activated at a specific time, enter the request in the Reason/Comments.

Reason/Comments

Provide any information that would be helpful for the ServiceNow personnel activating the plugin. For example, if you need the plugin activated at a specific time instead of during one of the default activation windows.

  1. Click Submit.

To install the Event Management module in a developer instance you need to go to ServiceNow developer portal, Manage Instance -> Action -> Activate plugin and look for Event Management. 

Azure Data Lake Storage Gen2

Download and install Azure Storage Explorer

  1. As a prerequisite, you will need to download and install the Desktop version of https://azure.microsoft.com/en-us/features/storage-explorer/

  2. Once installed, nothing needs to be done with it until a later step.

Creation of an Azure App

Register server-side web app

The following instructions are adapted from the official Azure Active Directory documentation at https://docs.microsoft.com/en-us/azure/active-directory/develop/quickstart-register-app

  • Connect to Azure portal with your account.

  • Navigate to Azure Active Directory > App registrations > New registration.

Azure app registration
  • Add a name and select the Web type.

  • Click on Register.

Register an application

Enable Service Principal for the app

The newly created application tab labeled Overview contains details such as the Service Principal which can be found in the Managed application in local directory item.

Make sure the Service Principal is enabled. Depending on how the app was created, you may need to create a Service Principal for it, follow the official Microsoft documentation.

Enabling service principal

Create a client secret for the app.

Navigate to the Certificates & secrets tab and create a New client secret. Make sure to note it down as you will need it later.

creating new client secret

Creation of an Azure Storage Account

The following instructions are adapted from the official Microsoft Azure Data Lake Storage Gen2 documentation.

  • Log in to the Azure Portal.

  • Search for Storage Accounts.

  • Click on the Add button.

  • Choose the Resource group or create one

  • In the Storage account name, enter a name for your storage account (must be unique).

  • Make sure the Account kind is StorageV2.

  • Click on the Advanced tab.

  • Enable Hierarchical namespace for Data Lake Storage Gen2.

Enabling hierarchical namespace
  • Click Review+Create button.

Creation of an Azure Container

The following instructions are adapted from the official Microsoft Azure Data Lake Storage Gen2 documentation.

  • Access your new Storage Account under the Storage Accounts section.

  • Click on containers in the left menu.

  • Click on the Container button.

  • In Name simply put the name of the container that you want to create.

  • Click the Create button.

Configure permissions for container:

  • Open Storage Explorer on the Azure Storage Explorer (Desktop version).

  • Look for your Storage Account and expand it.

  • Expand Blob containers.

  • Right-click your recently created container.

  • Click Manage Access in the context menu.

  • Click Add.

  • Type the user that you want to add and click Search.

  • Select that user and click Add.

  • Set the required permissions.

  • Click on OK.

Setting permissions

Installing the Connector Service

Once a machine with CentOS 7 or later is set up to host the connector service, the Event Connector rpm package can be easily installed using a terminal session by a user with administrative privileges by executing the following command:

$ yum install nxeventconnector-x.x.x-x.el7.noarch.rpm

Please note that x.x.x-x specifies the package version for the connector service rpm to be installed.

During the new installation, you will be prompted to state which back-end tool will be the target of the Event Connector so that the proper configuration files will be copied on the target directory.

Terminal session

Only one type of connector (Splunk, ServiceNow or Azure Data Lake Storage Gen2) can be installed on the same appliance.

Do not use CTRL+C to abort the installation process as it might leave the installation in an unstable state.

If the installation has been aborted, the following command will allow the Event Connector to be installed again.

$ rpm -e nxeventconnector

Installation with proxy

To perform a proper installation through a proxy, pip (the python package manager) should be configured to use the proxy. To do so, create or update the file pip.conf in /etc (/etc/pip.conf) with the following content:

[global]

proxy = <schema>://<user>:<pass>@<proxy_ip_or_hostname>:<port>

Configuring the Connector Service

General Configuration

The file with the general configuration for the connector service is located at /etc/nxeventconnector/config.conf by default. This file can be conveniently edited by root and nxeventconnector system users to adapt configuration. The file content may look similar to the following:

[GENERAL]
log_conf_file = /etc/nxeventconnector/logging.conf 
log_file = /var/log/nxeventconnector/nxeventconnector.log
log_level = INFO
log_format = %(asctime)s %(thread)s %(levelname)s %(module)s [-] %(message)s
verify_cert_engine = false
verify_cert_endpoint = false proxy_enabled = False
proxy_server = <schema>://<host>:<port
proxy_auth_type = None 
proxy_user = 
proxy_password =

[NEXTHINK]
uri = https://<portal_ip_address_or_hostname>
username = <portal_username>
password = <portal_password>

#[NEXTHINK_OAUTH] 
#oauth_provider_uri = https://agora.<region>.nexthink.cloud
#oauth_client_id = <oauth_client_id> 
#oauth_client_secret = <oauth_client_secret>


[ENGINES]
<engine_name> = https://<engine1_ip_address_or_dns>:<port>/2/query,
<WEB API 2.0 user>, <WEB API 2.0 password>, <timezone>
...
CODE

This will continue with one of the following sections, depending on the back-end:

For Splunk:

[SPLUNK_HEC]
URI = <protocol>://<your_splunk_instance>:<port>/services/collector 
token = <HEC_token>
ack_indexer = false
index = main
max_records_single_push = 10000
CODE

For ServiceNow:

[SERVICENOW]
URI = https://<instance>.service-now.com/api/global/em/jsonv2 
login = <user>
password = <pass>
max_records_single_push = 10000
CODE

For Azure Data Lake Storage Gen2:

[AZURE_DATALAKE_STORAGE_GEN2]
URI = https://<account_name>.dfs.core.windows.net/ 
tenant_id = <tenant_id>
client_id = <client_id> 
client_secret = <client_secret> 
filesystem = <filesystem>
max_records_single_push = 10000
CODE

Below are more detailed descriptions and possible values for each attribute: 

Attribute

Description

Values

log_conf_file

Path where the service log configuration file is stored

String with a file path

log_file

Path where the service log file itself is stored

String with a file path

log_level

Log level configured for the service logger

CRITICAL, ERROR, WARNING, INFO, DEBUG or NOTSET

log_format

Format of the log messages

String with a proper format for the Python Logger class

verify_cert_engine

Check for self-signed certificate in Nexthink Engine

true/false

verify_cert_target

Check for self-signed certificate in the back-end instance

true/false

proxy_enabled

Proxy enabled

true/false

proxy_server

URI of the proxy in the format:

<schema>://<proxy_ip_or_hostname>:<port>

String

proxy_auth_type

Proxy authentication type. Only supported None or Basic authentication

None/Basic

proxy_user

User for the proxy. Only with Basic auth.

String

proxy_password

Password for the proxy user. Only with Basic auth.

String

[ENGINES] section

Information about every Engine registered on the service with the following format:

https://<engine1_ip_address_or_dns>:<port>/2/query, <WEB API 2.0 user>, <WEB API 2.0 password>, <timezone>

Note: Default port is 1671 for On premises (V6.X) and 443 for Cloud

Includes Engine name and WEB API 2.0 endpoint and credentials, as well as its standard timezone

If either [NEXTHINK] or [NEXTHINK_OAUTH] section is defined, credentials should not be included.

Specific configuration for the authentication mechanism against Nexthink APIs. It can be configured either the [NEXTHINK] section (for Basic) or the [NEXTHINK_OAUTH] section (for OAuth).

[NEXTHINK] section

uri

URI of the Nexthink Portal in the format:

https://<portal_ip_address_or_hostname>

String

username

Username of a Nexthink account with permissions to make use of the Web API (NXQL) and the List Engines API

String

password

Password of the previous Nexthink account

String

[NEXTHINK_OAUTH] section. Please note that using OAuth is only supported starting from V6.30.8/2021.9.

oauth_provider_uri

Nexthink Cloud endpoint in the format:
<https://agora.<region>.nexthink.cloud>
Region may be requested through Support

String

oauth_client_id

Client ID with scopes to make use of the Web API (NXQL) and the List Engines API. Should be requested through Support

String

oauth_client_secret

Secret for the previous. Should be requested through Support

String

Specific configuration for the Splunk integration: 

URI

URI of the Splunk HEC

String

token

Splunk HEC token. It must be consistent with the token generated in HEC Token.

String with the GUID associated to the HEC token

ack_indexer

Splunk indexer acknowledgment enabled. It must be consistent with the selection in HEC Token.

true/false

index

Splunk index where the events will be stored. It must be consistent with the index selected in HEC Token. Note that this can be whatever index name the Splunk administrator has chosen and enabled in the system (i.e. main, nexthink, my_personal_index, etc)

String with the index

max_records_single_push

Maximum number of records to be streamed to Splunk in one single push

Integer

Specific configuration for the ServiceNow integration:

URI

URI of the ServiceNow endpoint

String

login

Username with role Event Management Integrator [evt_mgmt_integration]

String

password

Password for the user

String

max_records_single_push

Maximum number of records to be streamed to ServiceNow in one single push

Integer

Specific configuration for the Azure Data Lake Storage Gen2 integration:

URI

URI of the Data Lake instance

String

tenant_id

Id of the application's Azure Active Directory tenant

String

client_id

Client id of the Azure app created in 3.3.2

String

client_secret

Client secret of the Azure app created in 3.3.2

String

filesystem

Container where the events will be stored

String

max_records_single_push

Maximum number of records to be streamed to Azure Data Lake Storage Gen2 in a single push

Integer > 0

Event Configuration

The file with the configuration for the connector service is located at /etc/nxeventconnector/events.conf by default. This file can be conveniently edited by root and nxeventconnector system users to adapt configuration. The file content may be similar to the following:

 

#------------------ LONG LASTING EVENTS (ONLY FOR SPLUNK) ---------------

[HIGH_CPU_USAGE]
mode = long_lasting
query = (select ((device (<device_fields>))
                 (device_warning (<device_warning_fields>))) 
            (from (device device_warning)
                (where device_warning
                    (eq type (enum "high overall cpu usage"))
                    (ge start_time (datetime <from>))
                    (lt start_time (datetime <to>))
                )
                (where device_warning
                    (eq type (enum "high overall cpu usage"))
                    (ge end_time (datetime <from>))
                    (lt end_time (datetime <to>))
                )
                (between <from> <to>)
            )
            (limit 4294967295)
        )
mapping = {"device_fields": {"name": "src"},
           "device_warning_fields": {"duration": "duration", "id": "id", "info": "info", "type": "type", 
                                     "value": "value",
                                    "warning_duration": "warning_duration"}}
frequency = 1
delay = 1
platforms = windows 

# . . .

#------- PUNCTUAL EVENTS-------------        

[DEVICE_BOOT]
mode = punctual
query = (select ((device (<device_fields>))
                 (device_activity (<device_activity_fields>))) 
            (from (device device_activity)
                (where device_activity 
                    (eq type (enum boot))
                    (ge time (datetime <from>))
                    (lt time (datetime <to>))
                )
                (between <from> <to>)
            )
            (limit 4294967295)
        )
mapping = {"device_fields": {"name": "src"}, 
           "device_activity_fields":
                      {"duration": "duration", "id": "id", "type": "type"}}
frequency = 5
delay = 5
severity = 2
description = Device boot 
platforms = windows, mac_os

# . . .

#------LISTING EVENTS (SAMPLES)--------

[POWER_SERVERS]
mode = listing
query = (select (<device_fields>)
            (from device
                (where device
                    (eq device_type (enum server))
                    (eq device_model (pattern "Power*"))
                )
            )
        )
mapping = {"device_fields": {"name": "name", "id": "id",
                            "device_type": "dev_type", "device_model":
                            "dev_model"}} frequency = 1440
delay = 5
severity = 2
description = Power servers platforms = windows

# . . .
CODE

For clarity, different types of events are usually listed separately in the file. Punctual events are those whose lifetimes only span the instant when they occur, while long-lasting events can report several updates in addition to the instant they were created. Listing events are those events intended for reporting purposes and are able to query any type of information from the Nexthink database (objects, events, object with event decoration, etc).

Here is a more detailed description and possible values for each attribute:

Attribute

Description

Values

section names

(i.e. [DEVICE_BOOT])

Event name

String with the section named as an event

mode

Event mode

long_lasting, punctual, listing or listing_advanced

timestamp

Time of event to be used as timestamp in the event. This is only allowed in listing_advanced mode with event-related queries.

time, start_time or end_time, depending on the event table being queried

query

NXQL query to perform in order to retrieve the data. Mapping and date dynamic fields are enclosed in <>

String with the proper query to retrieve the Nexthink data

mapping

Dictionary with the mapping associated to each field tag specified in the query. For each field tag the associated Nexthink and back-end data model fields are specified*

String with the proper mapping

frequency

Minutes between consecutive event data retrieval. It must be greater than or equal to its delay.

Integer > 0

severity

For ServiceNow only. The severity of the event. The options are typically interpreted as follows:

  • Critical(5): Immediate action is required. The resource is either not functional or critical problems are imminent.

  • Major(4): Major functionality is severely impaired or performance has significantly degraded.

  • Minor(3): Partial, non-critical loss of functionality or performance degradation occurred.

  • Warning(2): Attention is required, even though the resource is still functional.

  • Info(1): An alert is created. The resource is still functional.

  • Clear(0): No action is required. An alert is not created from this event. Existing alerts are closed.

Integer (0,5)

description

For ServiceNow only. A reason for event generation. Shows additional details about an issue. For example, a server stack trace or details from a monitoring tool. This field has a maximum length of 4000.

String

directory

For Azure Data Lake Storage Gen2 only. Path of the base directory that will be created and will contain the CSVs files. Note: A sub-directory with the name of the event will be automatically added under the base directory:

<directory>/<event_name>/<file>

Note, directory allows complex directory paths in the following format:

<directory1>/<directory2>/.../<event_name><file>

String with the Directory name

date_folders

For Azure Data Lake Storage Gen2 only. Boolean representing if a hierarchical data folder structure must be created:

<directory>/<event_name>/<yyyy>/<mm>/<dd>/<file>

true/false

delay

Minutes of delay to be considered when retrieving event data

Integer >= 0

platforms

Tuple with the operating systems which queried devices must belong to

windows, mac_os or mobile

Note that, if a given dynamic field (like scores, categories, outputs of Remote Actions, etc.) is going to be used in the mapping, the double quotes must be escaped with a backslash ‘\’. Please see the example below:

mapping = {"device_fields": {"name": "src", "#\"action:Get MS Office
                            information/OSTOverallSize\"":"OSTOverallSize"}}
CODE

Also, as the ServiceNow events table has some columns with special metadata meaning, there are some tags with prefixes that can be prepended to the column name in the mapping parameter (in the format servicenow_+<servicenow_column>), clearly stating that the desired field in Nexthink will be mapped to one of those special columns in ServiceNow. The possible columns that can be used with the prefixes are:

  • node: device identifier (name, FQDN, IP or MAC address). If it is not used, the tag will be empty.

  • resource: application, disk, device, etc. If it is not used, the tag will be empty.

  • metric_name: if not used, the tag will be populated with the corresponding event triggering (i.e., the event name stated as the header in the events.conf file).

  • type: if not used, the tag will be populated with the list of tables targeted by the underlying NXQL query.

Logrotate

Nexthink Event Connector service comes installed with a default configuration for the logrotate Linux service in case it is installed. The aim of this service is to prevent a file from growing unlimitedly by flushing its content periodically, allowing several files to store only a limited amount of their original information. That configuration can be found at /etc/logrotate.d/nxeventconnector :

/var/log/nxeventconnector/nxeventconnector.log { 
        missingok
        copytruncate 
        compress 
        daily 
        size=2M 
        rotate 5
        create 0600 nxeventconnector nxeventconnector
}
CODE

This configuration is executed daily and basically rotates the log file when it reaches 2 MB. It will compress the rotated log file and keep a maximum of 5 rotated files. In case any modification is needed, check the man page by typing in a Linux terminal:

man logrotate

Please note that if the logrotate service is not installed along with Nexthink Event Connector service, some manual maintenance for the log file should be done periodically so it does not get excessively big.

Connector Modes – Selecting the appropriate one

Information Units

In the NXQL Data Model of Nexthink two different types of information units exist: events and objects. An event is an occurrence that happens at a defined moment in time, thus having a timestamp. There are several types of events that are at the core of Nexthink technology. These events are the basic unit of information. Each type of event is linked to a well-defined set of objects. Objects, for their part, represent items recognized by Nexthink.

It is vital to not confuse Nexthink events with Splunk or ServiceNow events. Depending on the type, one Nexthink event can be reported several times (updated) to Splunk or ServiceNow, thus creating several Splunk or ServiceNow events for a single Nexthink event.

Connector Modes

The Nexthink Event Connector recognizes the difference between the information units and provides four different modes for reporting this data. As discussed in the High level overview guide, these modes are long-lasting, punctual, listing and listing_advanced with each intended for a specific purpose. Therefore, choosing the mode which best suits your needs is a key aspect of configuring the connector. The following provides more detailed explanations of the differences between the modes:

  • Long-lasting: Available only for Splunk, the main goal of this mode is to keep track of all the information related to a given durable event. This mode will only take into consideration events either starting or ending during the time window of interest. As the lifetime of long-lasting events is not just an instant, but a period, this mode reports several updates for the given event, in addition to reporting data at the instant when the event is created.
    More precisely, this mode will send a long_lasting_started event to Splunk for each record retrieved by the NXQL query if the event was initiated during the time window of interest. The timestamp of this event will correspond to the Nexthink initial timestamp (start_time field)of the event.
    In addition to this initial event, one long_lasting_updated event will be sent to Splunk along each of the subsequent time windows as long as the Nexthink event is still alive during these windows of time. The timestamp will be set to the Nexthink current timestamp (end_time field ) of the event. Be aware that the Nexthink end_time field is modified after each Collector's update to the Engine.

  • Punctual: This mode is intended to report one-time events, i.e., those events whose life span is just the instant when they occur. As is the case with the previous mode, it will only take into consideration those events occurring during the time window of interest.
    Therefore, a single punctual event is sent to Splunk/ServiceNow/Azure Data Lake for each record retrieved by the NXQL query. The timestamp of this event corresponds to the exact Nexthink timestamp when the event occurred (start_time or time fields).

  • Listing: This mode is mainly dedicated to reporting or inventory purposes, thus being able to query any type of information from Nexthink database (objects, events, object with event decoration, aggregates, etc). This mode does not perform any time-related checks, so any desired date filtering must be explicitly stated in the NXQL query.
    As inventory is the main goal, all Nexthink records retrieved by the NXQL query will be sent to Splunk/Azure Data Lake with the same timestamp, i.e., the moment when the query was launched.

  • Listing advanced: This mode is very similar to the previous one, as it does not perform any time-related checks, forcing any desired date filtering to be explicitly stated in the NXQL query.
    However, the main goal of this mode is to provide event-related reports/listings. Therefore, the Nexthink event date field (time, start_time or end_time) can be selected to be used as the Splunk/ServiceNow/Azure Data Lake timestamp.

Selecting the Appropriate Mode

Now that we understand the difference between modes, it is time to properly configure the service. Below are some useful tips to help in choosing the best mode for your needs.

When it is neccessary to compose a query based on aggregates, the appropriate mode would be either listing or listing_advanced. If this is not the case, you may consider choosing punctual for ServiceNow, or either punctual or long_lasting for Splunk. In order to make the correct choice, the following questions should be asked:

Based on needed information

Do you know the precise question to be answered by Nexthink before choosing the mode?

If the answer is yes, listing mode is likely enough and you will be able to configure queries appropriately. If the answer is no, the best approach is likely to rely on event modes (long_lasting used only for Splunk, and punctual), which can provide much more information.

Given the fact that not all modes are available for ServiceNow, it is important to delve into this question by looking at Splunk and ServiceNow cases separately.

Splunk

When doing a capacity study about your network, you might ask: how much daily traffic belongs to Dropbox? This is quite a precise question, so you could simply define an NXQL query with aggregates to retrieve that information on a daily basis, as shown below in the code snippet below.

Note that using <from> and <to> tags in the between clause will allow the information to be obtained between the moment when the query is launched and 24 hours before. Frequency has been set to 1440 minutes, which is equivalent to 24 hours. If you need the information belonging to the entire duration of the previous day, you could simply set the between clause to midnight-1d and midnight.

[DAILY_TRAFFIC_OF_DROPBOX]
mode = listing
query = (select (<application_fields>)
            (from application 
                (with connection
                    (where connection
                        (ne status (enum "no host")) 
                        (ne status (enum "no service")) 
                        (ne status (enum "rejected"))
                    )
                    (where application
                        (eq name (string "Dropbox"))
                    )
                    (compute <aggregate_fields>) 
                    (between <from> <to>)
                )
            )
        )
mapping = {"application_fields ": {"name": "app_name",c"platform": "platform"}, 
            "aggregate_fields": {"incoming_traffic": "incoming_traffic",
                                  "outgoing_traffic": "outgoing_traffic"}}
frequency = 1440
delay = 5
platforms = windows, mac_os 
CODE

However, what if you want to know: how much daily incoming traffic belongs to application X on day Y during Z hours? In this case, you could send all Nexthink connection events on a given frequency, let’s say 5 minutes to Splunk. You would also send the application and device-related events to each connection in Splunk. This way, all the information needed to answer the question above, as well as many other possible questions, will already be present in Splunk, as shown in the code snippet below.

[DETAILED_CONNECTIONS]
mode = long_lasting
query = (select ((application (<application_fields>))
                 (device (<device_fields>)) 
                 (connection (<connection_fields>)))
            (from (application device connection) 
                (where connection
                    (ne status (enum "no host")) 
                    (ne status (enum "no service")) 
                    (ne status (enum "rejected"))
                    (ge start_time (datetime <from>)) 
                    (lt start_time (datetime <to>))
                )
                (where connection
                    (ne status (enum "no host")) 
                    (ne status (enum "no service")) 
                    (ne status (enum "rejected"))
                    (ge end_time (datetime <from>)) 
                    (lt end_time (datetime <to>))
                )
                (between <from> <to>)
            )
            (limit 4294967295)
        )
mapping = {"application_fields": {"name": "app_name"}, 
           "device_fields": {"name": "src"},
           "connection_fields": {"id": "id",
                                  "incoming_traffic": "bytes_in", 
                                  "outgoing_traffic": "bytes_out"}}
frequency = 5
delay = 5
platforms = windows, mac_os
CODE

ServiceNow

Similar to the case for Splunk, if you know exactly what you want to ask Nexthink, choosing the listing or listing_advanced modes seems to be the best option. If you know that certain devices are suffering from, say, Outlook performance issues, we can compose an aggregate query similar to what is shown in the code snippet below:

[OUTLOOK_PERFORMANCE_ISSUES]
mode = listing
query = (select (<device_fields>)
            (from device
                (with execution_error             
                    (where executable
                        (eq name (pattern "outlook.exe"))
                    )
                    (compute<aggregate_fields>)
                    (between <from><to>)
                )
                (having 
                    (ge number_of_application_crashes (integer 8))
                )
            )
            (limit 4294967295)
        )
mapping = {"device_fields": {"name": "servicenow_node"},
           "aggregate_fields": {"number_of_application_crashes": 
                              "number_of_application_crashes"}}
frequency = 60
delay = 5
severity = 2
description = A ticket must be created due to an Outlook 
performance issue (crash) on the device of the user
platforms = windows
CODE


If, on the other hand, you are not sure which application is suffering from performance issues, the solution would be to configure a more generic query that would belong to the punctual mode, as shown in the code snippet below:

[PERFORMANCE_ISSUES]
mode = punctual
query = (select ((device (<device_fields>))
                 (execution_error (<execution_error_fields>))
                 (binary (<binary_fields>)))
            (from (device execution_error binary)
                (where execution_error
                  (eq type (enum "crash"))
                )
                (between <from> <to>)
            )
            (limit 4294967295)
        )
mapping = {"device_fields": {"name": "src"},
           "execution_error_fields": {"id": "id", "info": "info", "type": "type"},
           "binary_fields": {"executable_name": "executable_name"}}
frequency = 5
delay = 5
severity = 2
description = Generic performance issues
platforms = windows
CODE

Based on Budget

How much can you spend on sending data?

Splunk

As Splunk pricing is based on the daily amount of data indexed by the instance, the amount of data to be sent to Splunk can impact the decision about what to report from Nexthink. If you have a limited budget for your license, it is best to use listing mode. On the other hand, those customers who have a bigger license capacity can leverage spending against the powerful benefits of the long_lasting and punctual event modes.

ServiceNow

For ServiceNow, it is best to consider minimizing the amount of data that is sent since ServiceNow is not a big-data service. The guidelines that were discussed above regarding Splunk, still apply here.

Azure Data Lake Storage Gen2

The purpose of Azure Data Lake is to work with massive amounts of data. The costs are related to the number of requests made, as well as the amount of data stored, so it is best to balance both.

Azure allows you to define certain storage lifecycle policies to remove old data and only keep necessary data.

You can find more information in the following links:

Tools Supporting Configuration

There is a set of tools included with the python package containing the Nexthink Event Connector service.

Query Validator

This tool helps users to check if there are any syntax errors in the queries specification. It can be executed by typing the next command on a terminal inside the Linux system where the service is installed:

nxeventconnector-check-query [-c <config_file>] [-e <events_file>]

Default general and event configuration can be overridden by passing optional input parameters. If there is no error, the terminal will print something similar to the information listed below:

---------------
Query Validator
---------------

-> Event failed_connections OK
-> Event device_boot OK
-> Event system_crash OK
-> Event smart_disk OK
-> Event hard_reset OK
-> Event high_cpu_usage OK
-> Event high_memory_usage OK
-> Event high_io_usage OK
-> Event high_number_of_page_faults OK
-> Event execution_crash OK
-> Event execution_freeze OK
-> Event high_application_cpu OK
-> Event high_application_memory OK
-> Event print OK
-> Event user_logon OK

All queries syntax is correct! Please note that this does not guarantee the data availability
CODE

If some query generated contains any syntax errors:

---------------
Query Validator
---------------

-> Event failed_connections OK
-> Event device_boot OK

ERROR system_crash - NxqlWrongResponse: HTTP Status 400 - 'seect' is not a 
valid keyword in context. Options are: select, update - 
https://172.19.7.8:1671/2/query?platform=windows&format=json&query=
(seect%20((devi ce%20(name))%20(device_error%20(error_label%20start_time%20id)))
%20(from%20(device%20device_error)%20 (where%20device_error%20
(eq%20type%20(enum%20%22system_crash%22))%20(ge%20start_time%20
(datetime%202017-06- 16%4011%3A14%3A13))%20(lt%20start_time%20
(datetime%202017-06-16%4011%3A15%3A13))%20)%20
(between%202017-06-16%4011%3A14%3A13%202017-06-16%4011%3A15%3A13)%20)%20
(limit%201)%20)
CODE

By reviewing the error message, it is clear that there is a typo in the ‘select’ statement which resulted in an incorrect response from the Engine.

Engine Availability

This tool helps users to check the availability of configured Engines. It can be executed by typing the next command on a terminal inside the Linux system running the service:

nxeventconnector-check-engine

If there is no error, something similar to the following should print in the terminal:

-------------------
Engine Availability
-------------------

-> Engine livedemoengine OK 
Engines successfully tested 1/1
CODE

In this case, there was only one Engine configured and it was possible to reach.

If the Engine had not been reachable, the command output would have looked as follows:

-------------------
Engine Availability
-------------------

ERROR No response from engine 'livedemoengine' - NxqlResponseNotReceived: 
HTTPSConnectionPool(host='17.19.7.8', port=1671): Max retries exceeded with url:

/2/query?platform=windows&format=json&query=(select%20(name)%20(from%20device)%20
( limit%201)) (Caused by NewConnectionError
('<requests.packages.urllib3.connection.VerifiedHTTPSConnection 
object at 0x7efcc560ba50>: Failed to establish a new connection: [Errno 111] 
Connection refused',)) for https://17.19.7.8:1671/2/query?platform=windows&format=
json&query=(select%20(name)%20(from%20device)%20(limit%201)) 

Engines successfully tested 0/1
CODE

By reviewing the error message, it is clear that there is a connectivity problem with the Engine since maximum connection attempts were exceeded.

Check Timezone

This tool helps users check standard timezones. It can be executed by typing the next command on a terminal inside the linux system running the service:

nxeventconnector-check-tz [timezone]

If no timezone parameter is passed to the command, the output on the terminal consists of a list with all the available timezones in the system:

nxeventconnector-check-tz

Timezone List

Africa/Abidjan 
Africa/Accra 
Africa/Addis_Ababa
Africa/Algiers
...
CODE

If a timezone parameter is passed as an input parameter to the command, the output message should display that the timezone is correct:

nxeventconnector-check-tz Australia/Sydney

Timezone 'Australia/Sydney' is correct
CODE

Here is a second example with an incorrect timezone:

nxeventconnector-check-tz Universe/Mars

Timezone 'Universe/Mars' is not correct
CODE

Update Engines

This tool helps users to configure the list of engines. To do so, it will connect to a configured Nexthink Portal and will retrieve the list of Engines from there. In the event this list of Engines differs from those configured in the config.conf, the user will be asked for confirmation to replace the existing Engines configuration in the file with the new one retrieved from the Portal. For this tool to work, the config file needs to have either the NEXTHINK or NEXTHINK_OAUTH section configured with the appropriate credentials.

The tool can be executed by typing the next command on a terminal inside the linux system running the service. It is necessary to stop the Event Connector service before running the tool.

nxeventconnector-update-engines-list [-c <config_file_path>]

If no config_file_path parameter is passed to the command, the action will be performed over the default config.conf file.

nxeventconnector-update-engines-list

Current Engines configured:
('labintegrations2', 'https://10.10.10.1:1671/2/query Europe/Madrid') Retrieved Engines:
('itm-engine-01', 'https://10.10.10.1:1671/2/query Europe/Madrid') ('itm-engine-02', 'https://10.10.10.2:1671/2/query Europe/Madrid')

Do you want to replace the Engines? [y/N]y 
Engines list successfully updated
CODE

Running the Service

In order to run the connector service, open a terminal session on the machine where it has been installed and execute it as a user holding administrative privileges:

systemctl start nxeventconnector

While the service is up and running, it is possible to check its status by executing the next command in a terminal session:

systemctl status nxeventconnector

Once the service has been started, it is possible to stop it by executing the next command in a terminal session with a user holding administrative privileges:

systemctl stop nxeventconnector

Upgrade Considerations

In order to upgrade the application on a machine that already has it installed, simply use the command:

yum update nxeventconnector-x.x.x-x.el7.noarch.rpm

Note that if the application is upgraded, the current connector service configuration files are kept:

  • /etc/nxeventconnector/config.conf

  • /etc/nxeventconnector/events.conf

Service Dimensioning and Performance

Provisioning

The capacity of a service instance installed in a machine may vary depending on a range of variables, such as:

  • The number of Engines to retrieve data from.

  • The number of Events configured to retrieve data from all the Engines.

  • Load of the Engines for processing the NXQL queries associated to the configured events.

  • Capabilities of the machine where the connector is set up.

  • The network bandwidth.

Of all the points mentioned, the most relevant are the Number of Events and the Number of Engines, which will depend likewise on the minimum frequency time that has been set. We recommend the following minimum system requirements for our prediction to fall into place:

  • CPU: 2 CPU x 2,3 Ghz

  • Memory: 4GB RAM

The table below contains the predicted maximum engines number for ServiceNow, Splunk and Azure Data Lake Storage Gen2 based on a 5-minute minimum frequency time.

 

 

Load

 

Maximum Event records per cycle

Maximum Engines supported for Splunk

Maximum Engines supported for ServiceNow

Maximum Engines supported for ADLS2

 

Minimum Frequency Time

 

Recommended bandwith

 

Low

5000

250

230

137

 

 

 

 

 

5 minutes

 

 

15 mbps

10000

230

200

132

20000

200

170

130

 

Medium

50000

150

110

118

 

 

65 mbps

75000

120

90

112

100000

95

65

106

 

High

200000

55

35

82

 

 

150 mbps

300000

50

25

67

400000

45

20

57

10 Engines take around 16 seconds to process Events for Splunk, and 21 for ServiceNow; therefore, the limit will be hit for 150 and 110 Engines respectively.

Obviously, if we need a minimum frequency time of less than 5 minutes, we will have a tighter upper boundary on the maximum number of Engines, and conversely, if our required minimum frequency time is higher, the upper boundary will be bigger.

For the Azure Data Lake Storage Gen2 integration, the bandwidth usage will heavily vary depending on the configured event, the more fields that are configured, the more data that is sent over the wire.

Procedure

In order to calculate what we might need, one approach to estimate the provisioning for the number of service instances needed might be:

  • Set up a machine containing one service instance.

  • Configure events whose associated data is needed to be sent to the remote platform.

  • Configure just one Engine, which has an average number of devices in relation to the rest of the Engines of the Nexthink environment.

  • Start the service at a moment of average load.

  • Observe how long the first iteration of data retrieval takes.

  • Estimate the total number of Engines capacity for the service instance.

  • Provision additional machines to host more service instances, if necessary.

Test Case

As a reference, below is the test case used to achieve the results presented above. This will help you to calculate your own requirements based on your actual needs.

These are the features of an average Nexthink system picked to apply this procedure:

  • There were 20 engines containing 11,000 devices on average.

  • There were 23 listing type events configured.

  • There were approximately 50,000 event records received per Engine on average in an iteration with all the different events involved.

  • One service instance was installed in a virtual machine with 8GB and 2.3Ghz x 2 CPUs, as described in the test environment section.

  • Max observed bandwidth value was 142 Mbps.

After running 3 test iterations for each number of engines, these are the results for the total time taken by the service to process all the events from the Engines:

Time to send all retrieved events

It is easy to see that considering an average number of events to be retrieved from every Engine, the time needed to send them all to the remote platform scales linearly on the number of Engines.

Now, it is time to estimate how many Engines the service instance will be able to handle at maximum. Considering that the minimum frequency configured for all the events is 5 minutes, in this case it is recommended not to exceed 4 minutes of total processing time, leaving 1 minute free for peak times.

If 10 Engines processing takes around 16 seconds for Splunk and aprox. 21 seconds for ServiceNow and ADLS2, it seems that the limit (240 seconds in this case) will be hit with approximately 150 and 110 Engines respectively.

Running an experiment with 20 Engines results in approximately 31 seconds and 42 seconds of total processing time, which confirms the linear scalability pointed to by the tests run from 1 to 10 Engines.

According to the results, in this Nexthink environment a single service instance installed in a machine with the described capability, would be able to handle about 150 Engines data retrieval to be sent to Splunk, 110 Engines for ServiceNow and 118 Engines for ADLS2.

When Nexthink Event Connector is installed in a Nexthink environment (inside its own server), a similar procedure must be applied to estimate the capacity of the service depending on the machine where it is installed and the number of Engines necessary to handle it.

Refer to the table from the beginning of this section to see, at a glance, the results of the test.

Also, please note that this provisioning study considers the specified configurations and default event set. For a very customized configuration, further checks and tests might be necessary. This is supposed to serve as a basic guide for the majority of Nexthink environments.

Performance

The test environment used consists of:

  • Nexthink Appliance and Nexthink Event Connector installed on it. It has 8 GB RAM and 2.3Ghz x 2 cores.

  • Nexthink Engines 6.21.1.2 to query data using the Web API 2.0 based on NXQL.

  • A simple fake server was used for the stress test in order to eliminate the noise created by the network latency and bandwidth for Splunk and ServiceNow integrations.

  • A proper ADLS2 setup for the Azure Data Lake Storage Gen2 integration.

Stress test

The test to measure the evolution under different stress conditions, considers the described test environment with a variable number of events, each one with the same amount of data to process. Specifically, the service will be run varying the following parameters:

  • Number of records retrieved per event (5,000-400,000).

  • Number of configured events with the data load described above (1-23).

The remote services (Splunk and ServiceNow) were replaced by a simple fake server in order to measure only the Nexthink Event Connector performance.

Stress test for Splunk integration
Stress test for ServiceNow integration
Stress test for ADLS2 integration

Each one of the series shown in the line plot corresponds to an execution against a distinct number of engines. The x-axis represents the number of records retrieved in total and the y-axis the total time (in seconds) to process all the data by the service instance and send it to the fake server (except for ADLS2 integration, which uses a real backend).

From the results, the following important observations can be derived:

  • The more events to be processed, the more overhead we get for encoding and sending, and the bigger the slope of the line.

This experiment is a good start to estimate service capacity under extreme conditions of load. Here all the events with similar load are being considered. However, usually there are only 2-3 events with a large amount of data to process whereas the rest may have a medium/low data load. As was stated earlier, the goal of this experiment was to push the service to its limits.

When performing these types of tests, it is important to always keep in mind that the acceptable time limit for production purposes usually is about minimum_event_frequency - 1 minute free for peak times.

Potential Overload

Will there be an overload when the connector works for ServiceNow, Splunk and ADLS2? The answer to this question is fairly simple: No. Due to design constraints, it is impossible to install more than one connector, or a single one that works for ServiceNow, Splunk and ADLS2.

Support

Nexthink provides support for the application in accordance with the terms and conditions of the Support and Maintenance Agreement applicable in between the customer and Nexthink. If you have any questions, please contact us via the Nexthink support portal.