Remediating disk issue with a chatbot

To help you get started, review the following NQL examples for typical chatbot use cases.

To provide employees with basic troubleshooting related to their device issues, leverage Nexthink data about employee devices to retrieve basic device information and launch a remediation to fix a disk issue.

Prerequisites

Before you start integrating your chatbot with Nexthink, ensure that you have the following in place:

Nexthink license
Administrator permissions

Designing User-Chatbot Interaction with API Technologies

Design standardized steps for the employees-chatbot interaction and define the API technologies to use in each step. The following table illustrates the interaction steps for this example.

Interaction stage

API technology

Step 1: Authenticating communication

Configure API credentials and Collect a token via the API.

Step 2: Identifying the device

Use NQL API to allow the chatbot to identify the devices based on the username. Alternatively, use the Data Exporter.

Step 3: Diagnosing the device

Use NQL API (or Data Exporter) to retrieve device performance data and outputs of the Get Startup Impact and Get Battery Status remote actions.

Step 4: Remediating the issue

Use Remote Action API to trigger the Disk Cleanup remote action on a user's device.

Step 5: Following up on the fix

Use NQL API to get the remediation status and details and inform the user about the remediation results.

The selection of technology will hinge on how you currently utilize Nexthink features and the limits available in your license. Refer to Nexthink Infinity thresholds and limits overviewdocumentation page for more information, including API limits.

In Nexthink

Configure API features in the Nexthink web interface. Nexthink recommends following the sequence in which the features are listed, as some are interdependent. Nonetheless, you have the flexibility to navigate between different solutions according to your preference.

Configuring data collection remote actions

Install the following remote actions from Nexthink Library:

Get Startup Impact
Get Battery Status

Schedule the executions to occur daily. Refer to Managing remote actions for more information

When you complete this step, save the NQL ID of both remote actions to use in the next steps.

NQL ID's:

get_startup_impact_windows
get_battery_status

Configuring remediation remote actions

Install the Disk Cleanup remote action from Nexthink Library. If already Installed, copy it and configure it as follows:

Select API for the remote action trigger.
Set default input parameter values that are in line with your chatbot needs. See the Input parameters for Disk Cleanup remote action table below.

Input parameters for Disk Cleanup remote action

Input

Recommended value

DiskCleanupCampaignId

If you want to display a campaign, use the library campaign disk_cleanup_invoke.

If you don’t want a campaign confirmation pop-up to appear, enter value 00000000-0000-0000-0000-000000000000

CleanupCompletedCampaignId

If you want to display a campaign, use the library campaign disk_cleanup_completed.

If you don’t want a campaign notification pop-up to appear, enter value 00000000-0000-0000-0000-000000000000

RemoveFilesNotModifiedInDays

Default value from Library (7)

MaximumDelayInSeconds

Default value from Library (30)

CleanupLevel

Choose the cleanup level, Light or Deep.

This field works only if you decide not to use a campaign and involve the employee in choosing the cleanup level. Otherwise, an employee’s choice takes precedence.

When you complete this step, save the remote action's NQL ID for use in the next steps.

NQL ID: disk_cleanup

Creating NQL API queries

According to the designed scenario, you need to create three NQL API queries:

Get user devices based on the username of the user (Stage 1: Identifying device).
Get device data to perform diagnostics for the current topic (Stage 2: Diagnosing device).
Get the remote action status and results (Stage 4: Following up on the fix).

See details of each NQL API query.

Get user devices based on the username of the user

Query ID: #get_device_basic_infos

NQL query:

devices during past 7d
| with session.events past 7d
| where user.name == $username
| list collector.uid, device.name, operating_system.platform, operating_system.name,
       hardware.type, hardware.manufacturer, last_seen
| sort last_seen desc

The collector.uid field is the key that the system uses in the subsequent interactions to trigger a remote action.
Matching is based on the username. Alternative approaches are also available, see the Pre-built section.

Get device data to perform diagnostics for the current topic

Query ID: #diagnose_device_bad_health

NQL query:

devices
| where device.name == $device_name
| include device_performance.events during past 24h
| compute free_space = system_drive_free_space.avg() / 1000000000
| list collector.uid, device.name, free_space,
       remote_action.get_startup_impact_windows.execution.outputs.HighImpactCount,
       remote_action.get_battery_status.execution.outputs.BatteryHealth

The query includes the $device_name parameter, which the chatbot will retrieve in a previous stage of interaction.

In this example, you collect 3 data points:

The free_space is an out-of-the-box metric.
Checking applications with high startup impact (HighImpactCount) requires a remote action called Get Startup Impact from Nexthink Library.
Checking battery health (BatteryHealth) requires a remote action called Get Battery Status from Nexthink Library.

Refer to Configuring data collection remote actions section on this page.

If the remote actions are not scheduled or have not run yet on the device, then the corresponding columns appear empty.

Get the remote action status and results.

Query ID: #get_remote_action_result

NQL Query:

remote_action.executions past 24h
| where request_id == $request_id
| list request_id, device.name, remote_action.name, status, status_details, outputs

The query includes the $request_id parameter, which the chatbot will retrieve in a previous stage of interaction from the remote action API call.

You can use the same generic query for any configured remote action.

~~Nexthink recommends making a call to get the remote action results no earlier than 1 minute after triggering the remote action with the API.~~

When you complete this step, save the NQL ID's of all NQL API queries to use them in the next steps.

NQL ID's:

#get_device_basic_infos
#diagnose_device_bad_health
#get_remote_action_result

Creating API credentials

Create API credentials in the Nexthink web interface to establish secure communication between Nexthink and the chatbot. Select Remote Actions API and NQL API in the Permissions section. Refer to API credentials for more information.

When you complete this step, save the Client ID and Client Secret obtained during the credential creation in the Nexthink web interface.

In the chatbot's service layer

Once you configured all necessary API features within Nexthink, you can move on to implementing the API calls within the chatbot's service layer. The following steps reflect the user-chatbot interaction design.

Step 1: Authenticating communication

Prior to executing the following API calls, you first need to retrieve a valid authentication token. Refer to the Nexthink Nexthink Developer documentation on how to obtain a valid OAuth token using your generated API credentials.

Step 2: Identifying the device

Employee (user) “I have an issue with my device”

To identify the device, Use the #get_device_basic_infos NQL API query that you created previously (See: Get user devices based on the username of the user).

API Request

POST /api/v1/nql/execute

{
 "queryId": "#get_device_basic_infos",
 "parameters": {
 "username": "[username identified by the chatbot]"
  }
}

Example API Response

Status 200

{
 "queryId": "#get_device_basic_infos",
 "executedQuery": "...",
 "rows": 2,
 "executionDateTime": { ... },
 "headers": [
 "device.collector.uid",
 "device.name",
 "device.operating_system.platform",
 "device.operating_system.name",
 "device.hardware.type",
 "device.hardware.manufacturer",
 "device.last_seen"
    ],
 "data": [
        [
 "e0aa796d-e3af-47b5-88d8-228cf5551fb6",
 "XN1231242-2142",
 "Windows",
 "Windows 11 Pro 22H2 (64 bits)",
 "laptop",
 "Lenovo",
 "2023-07-17 17:53:49"
        ],
        [
 "9866a43b-caab-4948-86d2-5567b3ac1d24",
 "XCX124231-1231",
 "macOS",
 "macOS Ventura 13.4.1 (ARM 64 bits)",
 "laptop",
 "Apple",
 "2023-07-17 17:53:03"
        ]
    ]
}

Notice:

If the status code is anything other than 200, then the request failed or the rate limiting was hit. Refer to the Nexthink Developer documentation for details.
If the returned list is empty, then the user has not been active on a device in the specified timeframe, past 7d in this example. Note that there is also a small delay between the time the employee starts using a device and the time the data is available in the Nexthink data platform.

Step 3: Diagnosing device

(Chatbot) “I have found the following devices for you, which one are you having issues with?”
(1) Laptop XN1231242-2142 (Lenovo)
(2) Laptop XCX124231-1231 (Apple)

Employee (user) “1”

To get device information, use the #diagnose_device_bad_health NQL API query that you created previously (See: Get device data to perform diagnostics for the current topic).

Input from the previous step

You retrieved the device name in Step 2: Identifying device.

API Request

POST /api/v1/nql/execute

{
 "queryId": "#diagnose_device_bad_health",
 "parameters": {
 "device_name": "[device name identified by the chatbot]"
  }
}

Example API Response

Status 200

{
 "queryId": "#diagnose_device_bad_health",
 "executedQuery": "...",
 "rows": 1,
 "executionDateTime": { ... },
 "headers": [
 "device.collector.uid",
 "device.name",
 "free_space_GB",
 "remote_action.get_startup_impact_windows.execution.outputs.HighImpactApplications",
 "remote_action.get_startup_impact_windows.execution.outputs.HighImpactCount",
 "remote_action.get_battery_status.execution.outputs.BatteryHealth"
    ],
 "data": [
        [
 "e0aa796d-e3af-47b5-88d8-228cf5551fb6",
 "XN1231242-2142",
 2.2316807136971,
 "",
 null,
 0.9
        ]
   ]
}

Note that the collector.uid field is the key that the system uses in subsequent interactions to trigger a remote action.
The chatbot uses three columns as decision branches in a conversation:
- IF free_space_GB <= 6 THEN trigger library remote action for remediation Disk Cleanup.
- IF remote_action.get_startup_impact_windows.execution.outputs.HighImpactCount > 0 THEN trigger library remote action for remediation Disable Application from Startup menu using the value of remote_action.get_startup_impact_windows.execution.outputs.HighImpactApplications to disable high-impact applications.
- IF remote_action.get_battery_status.execution.outputs.BatteryHealth <= 0.85 THEN a replacement battery would be required: create an ITSM ticket from the chatbot to follow up.

Notice:

If the status code is anything other than 200, then the request failed or the rate limiting was hit. Refer to the Nexthink developer platform documentation for more information.
If the list returned by the system is empty, then the device was not found.
If any of the diagnostic fields are null or empty, then no information is available. Possible reasons for empty values are:
- For data platform fields, the field is not supported by the platform. Please refer to the NQL data model documentation for more information.
- For remote action fields, the system has not yet executed the remote action successfully. Check the remote action schedule and targeting NQL query, and the remote action dashboard for possible execution errors.

Step 4: Remediating the issue

(Chatbot) ”Thanks, I see that your device has only about 2 GB of free disk space left.”
“I can clean up unneeded files to prevent slowdowns.”
“Do you want me to proceed?”

Employee (user) “Yes”

Use Remote Action API to execute Disk Cleanup remote action that you configured in the previous step and remedy the problem (See: Configuring remediation remote actions).

Input from the previous step

You retrieved the collector.uid in Step 3: Diagnosing device via the #diagnose_device_bad_health NQL API (or a data export).

API Request

POST /api/v1/act/execute

{
 "remoteActionId": "disk_cleanup",
 "devices": ["e0aa796d-e3af-47b5-88d8-228cf5551fb6"]
}

Example API Response

Status 200

{
 "requestId":"f27efd0c-8cb2-4d00-aae0-261aa06729c7",
 "expiresInMinutes":10080
}

A successful API call indicates that the remote action is scheduled for execution as soon as the endpoint is ready.
The returned request ID allows you to follow up on the execution status.
The call is asynchronous, which means that a successful API call does not indicate that a remote action has started executing nor that it was completed.

Notice:

If the status code is anything other than 200, then the request failed or the rate limiting was reached. Refer to the Nexthink developer portal documentation for more information.

Step 5: Following up on the fix

(Chatbot) “I have launched the fix. It should only take a short while.”

Use the #get_remote_action_result NQL API query that you created previously to get the remote action status and results.

(See: Get the remote action status and results).

Nexthink recommends making a call to get the remote action results no earlier than 1 minute after triggering the remote action with the API.

Input from the previous step

You retrieved the request_id in Step 4: Remediating the issue via the Remote action API call.

API Request

POST /api/v1/nql/execute

{
 "queryId": "#get_remote_action_result",
 "parameters": {
 "request_id": "f27efd0c-8cb2-4d00-aae0-261aa06729c7"
  }
}

Example API Response

Status 200

{
 "queryId": "#get_remote_action_result",
 "executedQuery": "...",
 "rows": 1,
 "executionDateTime": { ... },
 "headers": [
 "remote_action.execution.request_id",
 "device.name",
 "remote_action.name",
 "remote_action.execution.status",
 "remote_action.execution.status_details",
 "remote_action.execution.outputs"
    ],
 "data": [
        [
 "XN1231242-2142",
 "f27efd0c-8cb2-4d00-aae0-261aa06729c7",
 "Disk Cleanup",
 "success",
 "Disk cleanup successfully performed. \r\nPowerShell exited with code 0\n",
 "{\"CleanupSpace\":1324470272.0}"
        ]
    ]
}

remote_action.execution.status indicates the current execution status. Key statuses include:
- success when the remote action has been successfully executed.
- in_progress if the system has not yet completed the execution. Refer to the Remote Action documentation for more information about the various states of remote actions.
remote_action.execution.status_details contains remote action execution details that can help you in troubleshooting. By design, it is not exposed directly to employees.
remote_action.execution.outputs is a JSON map that includes the output values of the remote action. For the Disk Cleanup remote action, the returned value is the amount of space that has been freed (in bytes).

Notice:

If the status code is anything other than 200, then the request failed or the rate limiting was reached. Refer to the Nexthink developer portal documentation for more information.
If the system returned an empty list, then the system has not yet created the remote action execution. This happens if you call the API too soon after the time you triggered the API, or if the execution is older than 24 hours, which is the timeframe specified in the query.

Ending the conversation

Display the remote action status and output retrieved in this step.

(Chatbot) “I am all done - I freed up 1.3 GB. Is there anything else I can do to help you?”

Using Nexthink Flow to configure chatbot integration

If your company has purchased the Nexthink Flow, you can configure a workflow that performs all the diagnostics and remediations described in the previous section, fully orchestrating the process within the workflow, without having to program the logic within the chatbot or other third-party solutions.

Use the Connector and Service API thinklets, to call API and provide real-time feedback.
Use the API listener thinklet to wait until a given third-party API resumes.
Trigger a workflow with the Flow API.

Refer to the Workflows documentation page for more information.

Pre-built content

To help you get started with chatbot integration, find examples of topics that you can use in typical chatbot conversations. Each topic includes:

Prerequisite data collection remote actions. All remote actions referred to in this section are available in Nexthink Library and you must configure them beforehand:
- The data collection remote action must have the trigger type Schedule active and a collection schedule set to hourly or daily, depending on the required frequency.
- The remediation remote action must have the trigger type API active. Refer to the Manage Remote Actions documentation for more information.
A Diagnostic NQL query.
Logic on how to interpret the diagnostics query results.
Possible remediations for each diagnostic.

You can implement the examples using a similar flow to the end-to-end use case, assuming that you have identified the device name beforehand.

General queries

As shown in the end-to-end use case, you need to configure generic queries in order to perform two basic tasks that can be useful across all use cases:

Task

NQL queries

Retrieve device of a user

Query ID: #get_device_basic_infos - matching based on the username

devices during past 7d
| with session.events past 7d
| where user.name == $username
| list collector.uid, device.name, 
  operating_system.platform, operating_system.name, 
  hardware.type, hardware.manufacturer, last_seen
| sort last_seen desc

You can adjust the where clause | where user.name == $username for use with alternative approaches, for example:

If you know the user UPN: | where user.upn == $upn (requires the UPN to be activated at collector level)
If you know the user email address: | where user.ad.email_address == $email (requires the Azure AD connector to be activated and the email field synced)
If you know the device name: | where device.name == $device_name

Adjust the timeframe if you want to consider devices on which the user has been active during a period of time other than 7 days.

Replace both past 7d clauses with the desired timeframe, up to the maximum data retention period in your Nexthink tenant, which is by default 30 days.

Retrieve the status and outputs of remediation remote actions

Query ID: #get_remote_action_result

remote_action.executions past 24h
| where request_id == $request_id
| list request_id, device.name, remote_action.name, 
  status, status_details, outputs

Microsoft Outlook issues

Diagnose issues that cause Microsoft Outlook to malfunction for an employee.

Type

Content

Platforms

Microsoft Windows

Data collection remote actions

Schedule data collection remote actions:

Get Microsoft Outlook online
Get Microsoft Outlook plugin crash details

NQL query

Query ID: #diagnose_outlook_issues

devices
| where device.name == $device_name
| include package.installed_packages
| where package.name in ["Microsoft Office", "Microsoft 365"]
| where package.version != "16.*"
| compute outdated_office_packages = package.name.count()
| list collector.uid, device.name,
  remote_action.get_outlook_plugin_crash_details_windows.execution.outputs.CrashedPluginList,
  remote_action.get_outlook_online_windows.execution.outputs.IsOnline,
  remote_action.get_outlook_online_windows.execution.status,
  boot.days_since_last_full_boot, outdated_office_packages

Remediation list

IF remote_action.get_outlook_online_windows.execution.outputs.IsOnline = "No" and remote_action.get_outlook_online_windows.execution.status = "success" THEN Run remediation remote action Set Outlook online
IF remote_action.get_outlook_plugin_crash_details_windows.execution.outputs.CrashedPluginList != "["-"]" THEN Run remediation remote action Set Outlook plugins
IF device.boot.days_since_last_full_boot > 10 THEN Recommend the employee restart their machine.
IF outdated_office_packages = "1" THEN Run remediation remote action Repair Office 365
Additional generic remediation: Run remediation remote action Repair Outlook OST Problem

Microsoft OneDrive issues

Diagnose issues related to Microsoft OneDrive and repair OneDrive when the system detects issues.

Type

Content

Platforms

Microsoft Windows

Data collection remote action

Schedule data collection remote actions:

Get OneDrive status

NQL query

Query ID: #diagnose_onedrive_issues

devices
| where device.name == $device_name
| list collector.uid, device.name,
  remote_action.get_onedrive_status.execution.outputs.OneDriveStatus

Remediation list

IF remote_action.get_onedrive_status.execution.outputs.OneDriveStatus contains one of the following:
1. OneDrive is not installed
2. OneDrive is installed but not running
3. OneDrive environment variable does not exist
4. OneDrive folder is not present

THEN Run remediation remote action Repair OneDrive

Slow PC issues

Diagnose a wide range of typical issues resulting in slowness on the endpoint.

Type

Content

Platforms

Microsoft Windows

Data collection remote actions

Schedule data collection remote actions:

Get GPO startup impact
Get startup impact

NQL query

Query ID: #diagnose_slow_pc_issues

devices during past 7d 
| where name == $device_name
| include device_performance.events during past 24h
| compute free_space = system_drive_free_space.avg() / 1000000000
| include package.installed_packages
| where package.name in ["*Microsoft Office*", "*Microsoft 365*"]
| where package.version != "16.*"
| compute outdated_office_packages = package.name.count()
| list collector.uid, device.name, free_space,
  boot.days_since_last_full_boot,
  outdated_office_packages,
  remote_action.get_startup_impact_windows.execution.outputs.HighImpactCount,
  remote_action.get_gpo_startup_impact_windows.execution.outputs.UserGpoAppliedTimeInSeconds,
  remote_action.get_gpo_startup_impact_windows.execution.outputs.UserGpoDCDiscoveryInSeconds

Remediation list

IF device.boot.days_since_last_full_boot > 10 THEN Recommend the employee restart their machine.
IF free_space_GB <= 6 THEN Run remediation remote action Disk cleanup
IF remote_action.get_startup_impact_windows.execution.outputs.HighImpactCount > 0 THEN Run remediation remote action Disable Application from Startup menu using the value of remote_action.get_startup_impact_windows.execution.outputs.HighImpactApplications to disable high-impact applications
IF remote_action.get_gpo_startup_impact_windows.execution.outputs.UserGpoAppliedTimeInSeconds + remote_action.get_gpo_startup_impact_windows.execution.outputs.UserGpoDCDiscoveryInSeconds > 10 THEN Run remediation remote action Update Group Policy settings
IF outdated_office_packages = "1" THEN Run remediation remote action Repair Office 365