# Binary grouping

Modern applications are often made up of several interconnected binaries—like background services or embedded components, such as browsers. This design improves performance but makes it harder to understand which processes are part of which application.

To solve this, Nexthink implements a specific way to tracks binary data in execution and connection events, using two distinct fileds:

* `binary` shows the application-level context for monitoring, attribution, and reporting.
* `real_binary` shows the actual executable that ran during the event or process.

This approach applies only during runtime—it is not reflected in the table of binaries in your Nexthink instance. The link between `binary` and `real_binary` is calculated when each event happens.

By separating the application context from the process execution, Nexthink provides clearer, more accurate insights. This helps IT teams better track resource usage, diagnose issues, and understand network behavior within the correct application context.

### **Binary grouping in practice**

To show how binary grouping works, this section uses Microsoft Teams as an example. This application includes multiple binaries—such as background services and shared components like WebView, its embedded browser.

When Microsoft Teams launches a helper process like `msedgewebview2.exe`, the executions data shows:

* `binary.name = ms-teams.exe`, the main or parent binary of the application
* `real_binary.name = msedgewebview2.exe`, the binary that was actually executed

The following figures show this binary hierarchy for macOS and Windows operating systems.

<figure><img src="https://268444917-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FxJSUDk9NTtCHYPG5EWs3%2Fuploads%2Fgit-blob-d6b009988b9042a48087e1a8d63538a982a83060%2Fimage%20(19).png?alt=media" alt="Microsoft Teams in Application Monitor (macOS)"><figcaption><p>Microsoft Teams in Application Monitor (macOS)</p></figcaption></figure>

<figure><img src="https://268444917-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FxJSUDk9NTtCHYPG5EWs3%2Fuploads%2Fgit-blob-ecd6065a87c6dc893cb4648995ee3fcc9fac5770%2Fhttps___files.gitbook.com_v0_b_gitbook-x-prod.appspot.com_o_spaces_2Fh7lLhrMjsSACP8jSdZ6s_2Fuploads_2FsysN2dHEJh449KxCLhHh_2Fteams_20macos_20task_20manager.png?alt=media" alt="Microsoft Teams in Task Manager (Windows)"><figcaption><p>Microsoft Teams in Task Manager (Windows)</p></figcaption></figure>

**NQL data model fields**

Binary grouping uses the following NQL data model fields:

<table><thead><tr><th width="185.46875">Field name</th><th>Description</th></tr></thead><tbody><tr><td><code>binary</code></td><td>This metric now refers to the application context responsible for the event. This is no longer necessarily the real binary that was executed.</td></tr><tr><td><code>memory</code></td><td><p>This metric represents the average memory used by all processes in the same execution tree—the main process and all sub-processes—weighted by their execution duration within the time bucket.</p><p>This value is available only for main processes and is <code>NULL</code> for subprocesses.</p><p>Legacy data reports the average memory usage of the <code>real_binary</code>.</p></td></tr><tr><td><code>real_binary</code></td><td>This metric identifies the actual binary executed in the process that triggered the event.</td></tr><tr><td><code>process_hierarchy</code></td><td><p>This metric indicates the runtime role of the process. Possible values:</p><ul><li><code>main_process</code></li><li><code>sub_process</code></li><li><code>NULL</code> for legacy data.</li></ul></td></tr><tr><td><code>real_memory</code></td><td>This metric reports the average memory used by all processes running the same <code>real_binary</code> during the time bucket. The value is weighted by the execution duration of each process.</td></tr></tbody></table>

{% hint style="info" %}
To ensure consistent analysis, Nexthink recommends using `binary` for high-level monitoring and attribution, and `real_binary` and `real_memory` for granular investigations at the process level.
{% endhint %}

**Understanding the hierarchy of processes**

The system identifies a main process during runtime to determine the correct application context. This logic sets the value of the `binary` field for all associated execution and connection events. The actual executable file responsible for the event is recorded in the `real_binary` field.

On Windows, a process qualifies as the main process if it meets at least one of the following criteria:

* The process opens a visible foreground window within 30 seconds of being launched by another process.
* The binary matches a predefined list of known application executables:

  ```
  ms-teams.exe, msteams.exe, outlook.exe, olk.exe, widgets.exe,
  widgetboard.exe, onedrive.exe, powerpnt.exe, excel.exe, onenote.exe,
  winword.exe, msedge.exe, pad.console.host.exe, searchapp.exe,
  pbidesktop.exe, bingwallpaper.exe, zoom.exe, acrobat.exe,
  firefox.exe, chrome.exe
  ```
* The process is started by a system launcher—such as `explorer.exe`, `svchost.exe`, or`wininit.exe` —or no valid parent process can be identified.

Detection begins three seconds after launch and continues for up to 30 seconds to collect visibility data. If none of these conditions apply, the process is treated as its own main process.

On macOS, the system designates a responsible process—the binary accountable for user interaction and permission handling. Nexthink uses this responsible process as the main process for attribution, regardless of which subprocess generates the event.

**Working with binary execution metrics**

Each `execution` and `connection` event includes two associations:

* **`binary`**, representing the application context, such as `teams.exe`
* **`real_binary`**, the actual executable that was run, such as `msedgewebview2.exe`

Binary-related metrics like `cpu_time`, `number_of_crashes`, `number_of_freezes`, `incoming_traffic` , or `outgoing_traffic` are recorded at the `real_binary` level and can be aggregated using standard functions by either `binary` or `real_binary`, depending on the desired scope.

Memory metrics are handled differently due to their complexity. Instead of being summed or averaged per process, memory usage is calculated as a weighted average within each time bucket. Each bucket groups all processes running the same binary, and the memory values are averaged using the execution duration of each process as the weight.

To provide both process-level and application-level visibility, two distinct fields are available:

* **`real_memory`** reflects the memory used by each process, grouped under the same `real_binary` during the time bucket. It provides a precise, process-level view of resource usage.
* **`memory`** captures the total memory footprint of the entire execution tree, and is available only when `process_hierarchy == main_process`. It is calculated as the weighted average across all relevant processes within the same execution tree.

#### Interpreting data in the new model - Microsoft Teams

This section provides ready-to-use NQL queries to help you investigate execution, connection, and resource usage data for Microsoft Teams.

<details>

<summary>Which devices crash and freeze while running Microsoft Teams?</summary>

This query returns the number of crashes associated with Microsoft Teams for both Windows and macOS, based on the application context (`binary`). You can either use it in a dashboard or to define a custom trend. See the [Custom trends management](https://docs.nexthink.com/platform/user-guide/administration/content-management/custom-trends-management) documentation.

```sql
devices
| include execution.crashes during past 24h
| where binary.name in ["ms-teams.exe", "msteams"]
| compute number_of_crashes_ = crash.number_of_crashes.sum()
| include execution.events during past 24h
| where binary.name in ["ms-teams.exe", "msteams"]
| compute number_of_freezes_ = number_of_freezes.sum()
| where number_of_crashes_ > 0 or number_of_freezes_ > 0
| list device.name, number_of_crashes_, number_of_freezes_
```

</details>

<details>

<summary>Which binary is responsible for Microsoft Teams freezing?</summary>

This query shows which real binaries used by Microsoft Teams reported freezes in the past 7 days. It summarizes the total number of freezes per `real_binary` and version, helping you identify which components, such as subprocesses, contribute to application instability.

```sql
execution.events during past 7d
| where binary.name in ["ms-teams.exe", "msteams"]
| summarize number_of_freezes_ = number_of_freezes.sum() by binary.name, real_binary.name, real_binary.version
| where number_of_freezes_ > 0
| list binary.name, real_binary.name, real_binary.version, number_of_freezes_
| sort real_binary.name asc
```

The query returns the following table:

<figure><img src="https://268444917-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FxJSUDk9NTtCHYPG5EWs3%2Fuploads%2Fgit-blob-43ffbb4acc085ecb4ee05fa2720494dc147e98d2%2Fhttps___files.gitbook.com_v0_b_gitbook-x-prod.appspot.com_o_spaces_2Fh7lLhrMjsSACP8jSdZ6s_2Fuploads_2FELB1oXFfdBsXgUYoMnBs_2Fimage.avif?alt=media" alt=""><figcaption></figcaption></figure>

</details>

<details>

<summary>What applications are affected when <code>msedgewebview2.exe</code> freezes?</summary>

This query identifies which applications (`binary`) are launching `msedgewebview2.exe` and how many freezes that subprocess generated over the past 7 days. It helps you trace freeze events back to their parent applications using improved binary grouping.

```sql
execution.events during past 7d
| where real_binary.name in ["msedgewebview2.exe"]
| summarize number_of_freezes_ = number_of_freezes.sum() by binary.name, real_binary.name
| list binary.name, real_binary.name, number_of_freezes_
| sort number_of_freezes_ desc
```

</details>

<details>

<summary>What is the incoming and outgoing traffic generated by Microsoft Teams on each device?</summary>

This query lists devices with Microsoft Teams traffic over the past 7 days. It shows the total incoming and outgoing network traffic per device, helping you identify which endpoints are generating the highest data usage for Teams.

```sql
devices
| include connection.events during past 7d
| where binary.name in ["ms-teams.exe", "msteams"]
| compute incomming_traffic_ = incoming_traffic.sum(), outgoing_traffic_ = outgoing_traffic.sum()
| list device.name, incomming_traffic_, outgoing_traffic_
| sort outgoing_traffic_ desc
```

</details>

<details>

<summary>What are the NQL query limitations for <code>with</code> and <code>include</code> ?</summary>

Each `execution` and `connection` event includes two associations:

* `binary`, representing the application context, such as`teams.exe`
* `real_binary`, the actual executable that was run, such as `msedgewebview2.exe`

When joining the `binaries` table with event tables, such as `execution.events` or `connection.events`, NQL uses the **`binary`** association only. NQL supports a single join path between two tables, and that path is based on the `binary` field—not `real_binary`.

See the following examples for explanations using real-life scenarios.

**Example 1**

Syntactically valid query, resulting in misleading results:

```sql
binaries 
| with execution.events during past 1h
| where real_binary.name == "msedgewebview2.exe" 
| compute cpu_time_ = cpu_time.sum() 
```

The query returns the following table:

<figure><img src="https://268444917-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FxJSUDk9NTtCHYPG5EWs3%2Fuploads%2Fgit-blob-1a007b28781a2b4c909bfc74629e66c22d23589f%2Fhttps___files.gitbook.com_v0_b_gitbook-x-prod.appspot.com_o_spaces_2Fh7lLhrMjsSACP8jSdZ6s_2Fuploads_2FgsFDOySfLpGAKvdAG7WO_2Fimage.png?alt=media" alt="Table with example"><figcaption></figcaption></figure>

While the syntax is valid, the results may be misleading:

1. The `where` clause filters for events where the `real_binary` is `msedgewebview2.exe`.
2. However, the joint between `binaries` and `execution.events` is still made using the `binary` association.
3. As a result, the computed `cpu_time_` reflects only the resource usage from `msedgewebview2.exe`, but it is grouped under the parent binary, such as `teams.exe`, `outlook.exe`, which can be confusing.

Using this type of partial attribution in dashboards or reports can lead to incorrect conclusions. When filtering on `real_binary`, avoid aggregating or displaying results using the `binary` association unless you intend to summarize `real_binary` activity under the parent context.

**Example 2**

Syntactically invalid query :

<pre class="language-sql"><code class="lang-sql"><strong>binaries
</strong>| with execution.events during past 1h
| where real_binary.name == "msedgewebview2.exe"
| compute cpu_time_ = cpu_time.sum()
| list binary.name, real_binary.name, cpu_time_
</code></pre>

<figure><img src="https://268444917-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FxJSUDk9NTtCHYPG5EWs3%2Fuploads%2Fgit-blob-930dd6e63db540c6e50f2a579cf1dc350d03d4ee%2Fhttps___files.gitbook.com_v0_b_gitbook-x-prod.appspot.com_o_spaces_2Fh7lLhrMjsSACP8jSdZ6s_2Fuploads_2FXaGT2buIITYig3RJOqKd_2Fimage.png?alt=media" alt="Query example"><figcaption></figcaption></figure>

In NQL, when joining from the `binaries` collection to `execution.events`, the query uses the binary association defined in the data model. This means that `binaries` can access only the `execution.events` that reference it through the `binary` field.

The `real_binary` association is not available in this joint context. Because `binaries` is linked only through the `binary` field in `execution.events`, trying to access `real_binary.name` results in an error, because it is not available in the query context.

</details>
