Domain 3 Answer Key

Network Issues

3.1 Question 1

Users at a remote corporate site (identified as s30 in the exhibit) are experiencing issues with a critical Enterprise Application hosted in the Data Center. The site connects to the central campus through an MPLS network.

The following exhibits show the network status before and after the issue began. Based on the information presented, what is the most likely cause of the problem and what actions would you take next as a Network Operations Engineer?

Exhibit 3.1-1: Before Issue

Exhibit 3.1-2: After Issue

View in ThousandEyes

A) Escalate to the transmission media team and have the optic fiber between 10.84.30.1 and 10.87.16.53 checked.
B) Review the bandwidth utilization at this site.
C) Reach out to the team that owns the Enterprise Application and have the server reviewed.
D) Check the routing tables on the MPLS network devices for any recent changes.

Explanation

From the figure, we can observe that the spike in latency is caused by the link between devices 10.84.30.1 and 10.87.10.253. Comparing the discovered network path between prior the incident and during the incident, we can confirm that no routing changes occurred as traffic always goes through these nodes. This is likely a network congestion or traffic load condition.

A valid next step is to review the bandwidth utilization and QoS settings at this site, to identify any possible network congestion conditions.

3.1 Question 2

Users on remote sites are reporting voice issues, can you identify possible causes and next steps from the following exhibits?

Exhibit 3.1-3: Before Incident #1

Exhibit D — Exhibit 3.1-4: Before Incident #2

Exhibit E — Exhibit 3.1-5: During Incident #1

Exhibit F — Exhibit 3.1-6: During Incident #2

View in ThousandEyes

A) Involve the Voice team as the RTP test does not return any relevant results for the agent located at site 20 (identified as s20 in the exhibit)
B) Verify the routing changes on device 10.87.7.51
C) Verify the docker host 10.84.50.53 and ensure the agent container is running.
D) Analyze the jitter and latency trends on the affected voice paths to identify potential network congestion.

Explanation

From the figures, we can observe that in normal conditions, traffic is forwarded from node 10.87.7.51 to 10.84.50.53. During the incident, there is forwarding loss observed at node 10.87.7.51. A Cisco ThousandEyes Enterprise Agent will display the “-“ character in the Table view when it is unable to complete measurements for a test.

This doesn't invalidate the test; it shows that data collection from one target agent wasn't completed. Instead of discarding the test, we should focus on other layers. Other agents measure expected MOS to this target, except the one at site 20, there may be issues specific to the target agent at site 20 that need attention.

End-device Issues

3.2 Question 1

Refer to the exhibits. The endpoint has the following IP credentials:

192.168.100.9/24, DNS: 8.8.8.8,8.8.4.4, GW: 192.168.100.1

Based on the views presented in the exhibits, what led to the error occurring on Sun, May 5 23:27 GMT +2?

A) The test target stopped responding.
B) The FQDN of the test target is non-existent.
C) The DNS servers assigned to the endpoint are unreachable.
D) The DNS settings on the endpoint are incorrect.

Explanation

Let's break down why the correct answer is the most indicative of a DNS issue that happened:

A) The test target stopped responding - the error says "The host name could not be resolved", the test target response happens after it is resolved. This is not a correct answer.
B) The FQDN of the test target is non-existent – this answer is not correct because the FQDN exists in the previous test round.
D) The DNS settings on the endpoint are incorrect – this answer is also incorrect because in the previous round the end-point was using the same DNS settings and the issue did not happen.
Correct answer: C) The DNS servers assigned to the endpoint are unreachable. This is likely because the endpoint is utilizing external DNS servers (8.8.8.8, 8.8.4.4), but based on the exhibit, external tracing is absent, meaning external resources are unreachable.

3.2 Question 2

The Endpoint stopped appearing online after it was moved to another network.

The customer reviewed the endpoint logs but did not identify anything suspicious.

The customer also confirmed that the endpoint was online on the old network, and the new network is fully operational. Other endpoints that were moved to the new network are also online. Since the new network is small, the admin is using static IP assignment. What is the best way to bring the endpoint online?

A) It may be an issue with the lack of space in the new network. The endpoint should be moved back to the old network.
B) The endpoint agent should be reinstalled to come online. This always helps.
C) The endpoint will automatically come online in 10-15 minutes, no action is needed.
D) Endpoint IP settings must be checked along with connectivity to c1.eb.thousandeyes.com.

Explanation

According to the log message, there was a timeout when attempting to connect to c1.eb.thousandeyes.com:

2024-05-07 13:20:28.498 DEBUG [3068.2148] net.WinHttpClient@WinHttpClient.cpp:2115 - Request to: wss://c1.eb.thousandeyes.com/relay/connect timed out: 12002: The operation timed out
2024-05-07 13:20:28.499 DEBUG [3068.8240] net.WinHttpClient@WinHttpClient.cpp:2115 - Request to: https://c1.eb.thousandeyes.com/status.json timed out: 12002: The operation timed out

Since the Endpoint was relocated to the new network with static IP assignment, troubleshooting should commence with verifying the accuracy of the IP credentials and ensuring connectivity with ThousandEyes.

Option A is incorrect because the new endpoint should function properly in the new network like other endpoints.
Option B is incorrect because reinstalling the software does not address IP issues.
Option C is incorrect because incorrect IP credentials manually entered will not resolve automatically.

Web Application Performance

3.3 Question 1

Review the exhibits. Based on the evidence, which action is most likely to solve the issue?

A) Modify the firewall rules to allow connections to the target domain
B) Modify the authentication credentials
C) Change the HTTP request method to PATCH
D) Modify the target URL to an available API endpoint

Explanation

This one is a tricky one, as it requires you to have a basic understanding of HTTP response codes (see Resources), let's have a look at each potential answer.

A) Modify the firewall rules to allow connections to the target domain is incorrect because our exhibit shows we are getting a response from the target server
C) Change the HTTP request method to PATCH is incorrect because nothing in the response header indicates that the request method is incorrect (that would be code 405 Method Not Allowed)
D) Modify the target URL to an available API endpoint is incorrect because nothing in the response points to an unavailable API endpoint

Finally, if we check the response code we are getting from the server, 401, we will find it to be a response to Unauthorized requests. Further inspection into the request headers will confirm the issue, as no Authentication header is being sent, thus, Modify the authentication credentials is the answer.

3.3 Question 2

Review the exhibits. Based on the evidence, what seems to be the underlying issue?

A) There is a network connectivity problem preventing us from reaching the target URL
B) One of the DOM elements cannot be found in the server
C) The request timed out waiting for the server to respond
D) There is a misconfiguration in the application server

Explanation

This one will also leverage your knowledge of HTTP response codes, albeit with a twist.

There is a network connectivity problem preventing us from reaching the target URL is incorrect because our exhibit shows we are getting a response from the target server
One of the DOM elements cannot be found in the server is incorrect because even though the waterfall chart is marking component 81 with an issue, we don't see any 404 Not Found response
The request timed out waiting for the server to respond is incorrect because the exhibit shows the server answering promptly to almost all component requests

Finally, if we check the response code we are getting from the server, we will find it to be a 302 Found redirect. The fact that each redirect is leading to another redirect (multiple 302 responses in a row) points to a misconfiguration on the server that is causing a loop within the app. Thus, the last option is correct:

Security Issues

3.4 Question 1

In real-life applications using ThousandEyes, you can switch between various views. However, for the exam, you will be limited to up to three exhibits. When reviewing answer options, remember to

Analyze using only the provided exhibits.
Choose the answer that can be confirmed with the information given.

Carefully review the exhibits. Which detail indicates the network issue might be caused by a BGP Hijack?

ThousandEyes API test view — Exhibit 3.4-1: Los Angeles before the Outage

A) Availability Drop
B) AS 16509 change to AS 10297
C) HTTP Server response delay
D) Packet Loss

Hint

Analyze the details and contrast the provided exhibits to accurately identify potential network issues. Note any changes in Autonomous System (AS) numbers, which are crucial for determining the cause of network problems.
If there are multiple agents visible in the path visualization view showing packet or forwarding loss, focus on one agent and compare its path against subsequent exhibits to determine the root cause.

Explanation

While the exhibits clearly depict a network issue with significant packet loss, pinpointing the exact cause as a BGP hijack requires careful analysis. Let's break down why the correct answer is the most indicative of a BGP hijack:

Availability Drop: Although a drop in availability is a symptom of the problem, it doesn't specifically point to BGP hijacking. Various network issues could cause availability drops.
HTTP Server response delay: Similar to availability drop, this is a symptom of the problem, likely caused by the packet loss, but it doesn't explicitly indicate BGP hijacking.
Packet Loss: Again, this is a clear symptom shown in the exhibits but doesn't directly confirm BGP hijacking as the cause.

However, the change in AS path from AS 16509 to AS 10297 is a strong indicator of BGP hijacking. This suggests that the route to the destination was illegitimately taken over by another AS, causing traffic to be misdirected and resulting in packet loss.

The shift in the AS path provides the most concrete evidence supporting the possibility of a BGP hijack in this scenario.

3.4 Question 2

Considering the observed network behavior and the information in the exhibits, which action would be the most appropriate next step for the network administrator to take?

A) Contact the internal network team to investigate potential misconfigurations on the local routers
B) Reach out to the Internet Service Provider (ISP) to report the suspected BGP hijacking incident
C) Implement traffic filtering rules on the firewall to block traffic originating from AS 10297
D) Restart the DNS server to refresh its cache and potentially resolve the observed issue

Explanation

The exhibits show a significant packet loss issue occurring at a specific point in the network path. The Path Visualization highlights a node within AS 10297 as the source of 100% forwarding loss for multiple agent locations. This suggests a problem beyond the local network and points towards a potential BGP routing issue, specifically a BGP hijack.

Thus, the correct answer is B) Reach out to the ISP to report the suspected BGP hijacking incident

Here's why the other options are not the best next steps:

Contact the internal network team to investigate potential misconfigurations on the local routers is incorrect because the issue appears to be external to the local network, as multiple geographically dispersed agents are affected, and the packet loss originates from AS 10297.
Implement traffic filtering rules on the firewall to block traffic originating from AS 10297 is incorrect as implementing firewall rules would not address the root cause, which is likely a routing issue outside of the local network's control.
Restart the DNS server to refresh its cache and potentially resolve the observed issue is incorrect because restarting the DNS server is unlikely to resolve a BGP hijacking issue, as the problem lies within the routing of traffic rather than the DNS server itself.