Skip to content

4.1 Network Condition Alert Rules

Configure alert rules based on network conditions, such as TCP protocol behavior, congestion, error counters, performance, throughput, state of BGP routing table, internet insights, MPLS, VPN, NetFlow, SNMP, and syslog

Alert rules can be configured in ThousandEyes to monitor various network conditions and metrics from Cloud and Enterprise Agents, Endpoint Agents, BGP, devices, and Internet Insights. Alerts can be set up to notify when thresholds are exceeded for metrics like packet loss, latency, jitter, page load time, throughput, BGP reachability, device interface status, and more.

Key Concepts

ThousandEyes Alert Rules

  • All alert rules have four sections: Description, Settings, Notifications, and Alert Conditions.
  • Settings configure the "big picture" of what test data will trigger the alert.
  • Notifications determine who/what systems get notified when an alert triggers.
  • Alert Conditions specify when the alert should trigger based on global and location criteria.
  • Default alert rules are automatically added to new tests but can be disabled.
  • Custom alert rules are recommended to match specific requirements and reduce noise.

Alert Structure

All alert rules have four sections:

  1. Description: The alert type (data source/test type) and the alert name.
  2. Settings: Selection of tests that will trigger this alert.
  3. Notifications: Recipients and systems to be notified when an alert is triggered.
  4. Alert Conditions: Criteria for when this alert should trigger.

Alert Structure

Figure 4.1-1: Alert Structure

Settings

Alert settings options change depending on the alert category selected. For example, Cloud and Enterprise Agents alerts have different settings than Endpoint Agents to reflect the supported test types and collected conditions.

There are 5 alert categories available:

Cloud and Enterprise Agents Endpoint Agents BGP Routing Devices Internet Insights
- Agents
- Severity
- Tests
- Agents
- Severity
- Visited Sites
- Monitors
- Prefix Length
- Devices - Affected Tests
- Catalog Providers
- Severity

Alert Conditions

Alert conditions have two sections:

  • Global condition
  • Location condition

When the global condition is met, any agent that meets the location condition in a test round will be included in the alert as "active".

Global Conditions

This section describes how to apply the global section of the Alert Conditions.

Alert Rule Conditions

Figure 4.1-2: Alert Rule Conditions

The Global section has the following format:

<All>/<Any> conditions are met by <any of>/<the same> ### <monitor>/<% of monitors>/<agent>/<% of agents> # of # times in a row

The <All>/<Any> conditions option sets how many individual location alert conditions are required to continue evaluating the Global section. For example, if "All" is selected, the alert will only trigger when all conditions are met. If "Any" is selected, the alert will trigger if any condition is met.

For alert rules that need more than one test round to trigger, the <any of>/<the same> section sets if the agents or monitors being evaluated must be the same each impacted test round. Setting <the same> allows you to catch specific use cases.

The <monitor>/<% of monitors>/<agent>/<% of agents> section allows you to choose a count or percentage of agents or monitors needed for the test to trigger. Using a percentage is best when you have multiple tests with varying numbers of agents or monitors.

When using a percentage, the percentage of agents or monitors is truncated, not rounded up. So if you have 14.7% of agents meeting the alert conditions and have set the "% of agents" to 15%, the alert will not trigger.

The # of # times in a row sets how many test rounds the alert rule will look at. When the two numbers are the same (1 of 1, 4 of 4, etc.), all specified test rounds must meet the location conditions for the alert to trigger. Think of this as a sliding window of test rounds that must contain the first number of rounds meeting the location conditions.

Location Conditions

Location alert conditions are where you set the specific metrics on which an alert becomes active. You can set any number of metrics for an alert, though bear in mind that the more metrics you set, the less likely it is an alert will activate.

Location alert conditions are configured by choosing at least one metric (the test characteristic against which you're measuring change) and one operator (the type of measure). Depending on the metric, other configurable options include threshold values and units.

A location alert is included within a global alert when a single alert trigger meets the location alert conditions for at least one round, regardless of the thresholds set for the global alert.

It's important to note that location alerts trigger and clear independently from the global alert. If you see multiple location alerts triggered under a global alert, you cannot assume that all the listed location alerts met the initial alert criteria from a per-round basis.

For more on global and location alert conditions, see the ThousandEyes documentation.

Example Alert Conditions

Understanding the alerting capabilities of a network assurance platform like ThousandEyes is highly important. This section summarizes common events to alert on for each alert category. The first column lists the event type, and the second contains the alert condition configuration.

Network Tests Alert Conditions

Use this configuration to monitor network metrics such as packet loss, latency, jitter, bandwidth, and throughput.

Event Condition
High Latency in Asia-Pacific Latency ≥ 180 ms
High Network Packet Loss Packet Loss ≥ threshold_%
High Network Jitter Jitter ≥ threshold_ms
QoS Marking Change Any hop not in DSCP dscp_value
Network Loop Detected Path length > max_path_length

BGP Routing Alert Conditions

Use this configuration to monitor AS path, route reachability, and route updates.

Event Condition
Route Flaps Path changes > 1 & reachability < 100%
Prefix Hijack BGP ASN not in expected_asn_list
DDoS Mitigation Activated BGP ASN in mitigation_asn_list or prefix not in expected_prefix_list
Upstream Provider Change BGP HOP# from origin not in expected_hop_list

DNS & Web Tests Alert Conditions

Use this configuration to monitor web server response time, wait time, load time, transaction duration, and/or DNS response.

Event Condition
Slow DNS Resolution Response time > 20 ms
DNS Mapping Change/Spoofing Mapping not in expected_ip_address
Slow Transaction Duration > threshold_ms
Embed URL Not Working Any component domain in domain_list & component load incomplete
Slow Throughput Throughput < threshold_kbps Kbps

Internet Insights Alert Conditions

Use this configuration to monitor affected applications, outage error types, locations for application outages, affected domains, ASNs, locations, and interfaces for network outages.

Event Condition
Google Workspace App Outage Affected App in Google Workspace
Application Outage due to DNS Affected app in app_list & Outage Error Type in DNS
CDN Network Outage in US Locations in United States & affected domain in domain_list
Network Outage Services Impact Affected tests count & location in location_list

Resources

Sample Questions

4.1 Question 1

Which of the following metrics can be used to configure an alert rule for Endpoint Agent HTTP Server tests? (Choose two)

  • A) Response Time
  • B) BGP Reachability
  • C) Error Type
  • D) Interface Throughput

4.1 Question 2

The alert shown in the exhibit is designed to detect which of the following network security issues?

  • A) Route poisoning
  • B) DNS poisoning
  • C) BGP hijacking
  • D) DNS hijacking

Exhibit 4.1-1

Exhibit 4.1-1

4.1 Question 3

Refer to the exhibit. The alert rule is set up as shown, but didn't trigger. Why?

  • A) Alert conditions weren't met and won't trigger with current setup
  • B) Alert needs two consecutive agent failures to trigger
  • C) Response code is set up incorrectly
  • D) All of the above

Exhibit 4.1-2

Exhibit 4.1-2

Exhibit 4.1-3

Exhibit 4.1-3

4.1 Question 4

Refer to the exhibit. A network engineer is tasked with configuring an alert that will trigger if the HTTP server responds with a server error. What alert conditions should be configured to meet the specified requirements?

  • A) Error type is any
  • B) Wait Time is Dynamic (New) with Medium sensitivity
  • C) Response Time ≥ Static 500ms
  • D) Response Code is server error(5XX)

Exhibit 4.1-4

Exhibit 4.1-4