3.3 Web Application Performance Issues

Diagnose web application performance issues using collected data such as browser waterfalls

This section explores concepts and tools that provide insights into web application performance problems.

Key Concepts

To effectively troubleshoot issues, it's essential to distinguish between application-related and network-related issues. This knowledge equips us to optimize both network performance and application functionality, ensuring a seamless user experience.

Waterfall Charts

A waterfall chart provides a visual representation of how a browser interacts with web page objects during the loading process. It displays the timeline of each object's download, including HTML, CSS, JavaScript files, images, and other resources.

The chart starts with the initial request to load the webpage and shows the sequential loading of objects over time. Each object's bar length represents the time taken to download it, while the horizontal axis represents time.

By analyzing a waterfall chart, developers can identify bottlenecks, such as slow-loading resources or dependencies, and optimize the webpage's performance. This insight helps improve user experience by reducing page load times and enhancing overall responsiveness.

Indicators for Analysis

Key indicators to look out for when analyzing performance:

Response Time: The time it takes for a web server to respond to a user's request.
Load Time: The time it takes for all elements of a web page to fully load and display to the user.
Availability: Whether the web application is accessible to users or experiencing downtime.
HTTP Status Codes: These codes provide insights into the success or failure of web requests, helping to pinpoint potential issues (e.g., server errors, user errors).

ThousandEyes provides the views to correlate these indicators with network data to provide a comprehensive understanding of web application performance bottlenecks.

Document Object Model (DOM)

The DOM represents an HTML document's structure as a tree of objects called nodes. Nodes have parent-child relationships, with some embedded in the main HTML (e.g., text, scripts) and others referencing external resources.

Learn more about waterfall charts and the DOM model in the ThousandEyes documentation.

Using Waterfall Chart Metrics for Troubleshooting

Among the various test types offered by ThousandEyes, only Page Load and Transaction tests generate a waterfall chart, providing a detailed visualization of how a browser interacts with web page objects during the loading process.

This section provides a simplified approach to troubleshooting web performance issues using waterfall chart metrics from ThousandEyes Page Load and Transaction tests. By understanding these metrics and following the proposed troubleshooting path, you can gather evidence to identify the problem and find a solution.

Start your troubleshooting process by asking questions based on the specific metric that shows abnormal change, as this can negatively impact user experience. Begin at the network layer and work your way up to the application layer, using the provided questions as a guide. After answering these questions, determine the appropriate next steps. Remember that each metric can be directly affected by common issues.

Connect

Metric meaning: The time to establish a TCP handshake with the Target Server.

Questions to Ask:

Is the TCP handshake established correctly?
What do the network metrics collected by the agent indicate?

Possible Causes:

Routing problems (path updates, router misconfiguration)
Packet loss
Internet outage
Application server outage

DNS

Metric meaning: The duration to resolve a domain record to an IP address. By default, BrowserBot does not cache DNS records at startup.

Questions to Ask:

Is DNS resolving properly?
How long is it taking to resolve?

Steps to Take:

Identify the DNS servers configured for the agent and who manages them.
Test if the DNS server is responding by directly running queries to it with dig or nslookup.
Test if the DNS is reachable.

Possible Causes:

DNS server outage
DNS server configuration problem
Network path to the DNS server affected
DNS Hijack

SSL

Metric meaning: The duration of SSL/TLS negotiation.

Questions to Ask:

Is SSL/TLS negotiation completing successfully?
Are there any SSL/TLS errors?

Steps to Take:

Collect information about the SSL/TLS handshake errors.
Analyze if the trust relationship is established between server and user (Does the agent trust the CA that issued the certificate that the server is presenting?).
Check if the certificate is presenting the full root chain.
Verify if the server is using a self-signed certificate.
If the certificate is not from a well-known CA, ensure the CA certificate is installed on the agent.

Possible Causes:

Application server configuration problem
Expired or invalid certificates

Send

Metric meaning: The duration in which the browser successfully sends a request to the server. Also known as Time to First Byte.

Questions to Ask:

Is the browser sending the request correctly?
How much time did it take?

Steps to Take:

Collect information about HTTP code errors to determine if this is a user (400 errors) or a server error (500 errors).

Possible Causes:

Incorrect proxy settings or misconfigured network settings
Browser Malformed Requests
Cross-Origin Resource Sharing (CORS) Errors
Ad Blockers or Browser Extensions

Wait

Metric meaning: The duration between the completion of a browser's SEND request and receipt of the first byte of a server's response.

Questions to Ask:

How much time did it take to hear back from the server?
What is the server's performance?

Steps to Take:

Correlate the web metrics to the network metrics such as latency and packet loss.
Identify the nodes in the path to the destination (e.g., CDNs, load balancers, firewalls).

Possible Causes:

Server processing time
Misconfigured or suboptimal server settings
Resource starvation (high CPU, memory, or disk I/O usage)
Network latency (physical distance between client and server)
Bandwidth throttling or limitations
CDN issues
Ineffective server-side caching
Client-side issues (slow DNS resolution, misconfigured network settings, outdated hardware)

Receive

Metric meaning: The time between the first byte of the server response to the last byte of the data payload.

Questions to Ask:

How much time did the server take to respond?
What is the time to last byte or content download time?

Steps to Take:

Analyze payload sizes using browser development tools or network analysis tools.
Collect a HAR file replicating the problem.
Monitor server metrics for CPU, memory, and I/O to identify any resource bottlenecks.
Profile application performance using APM tools to identify slow-running code, especially code that generates the response (outside the scope of ThousandEyes, where AppDynamics would be best suited).
Optimize content delivery by implementing or improving the use of a CDN and ensuring effective caching.
Review network performance and correlate with waterfall metrics using ThousandEyes path visualization.
Test across different networks to identify if the problem resides on the internet, a specific ISP AS, or the hosting network.
Enable compression (gzip or Brotli) on the server.
Investigate third-party services if the application relies on APIs or services (ThousandEyes has a test for APIs that can be monitored replicating the application flow and how it interacts with APIs, helping to root cause the problem).

Possible Causes:

Large payloads (not compressed or optimized)
Server performance problems (slow content generation)
Limited network bandwidth
Network congestion
CDN performance issues
Server resource limitations

Blocked

Metric meaning: The time that a browser waits for an already established connection to become available. Web browsers are designed to allow a maximum number of concurrent connections per domain. Blocking time means that the browser is waiting for other requests to complete and represents the time that is spent before a request is sent because other requests are being handled.

Questions to Ask:

Are there any requests in a blocked state?
How are requests being queued?

Steps to Take:

Use browser developer tools to see how requests are being queued and to identify any patterns in the blocked time (e.g., a specific file or domain).
Use ThousandEyes path visualization and layered views to correlate web metrics to the network.
Review rate limiting configurations to ensure they are appropriate for your traffic levels.
Optimize page load by reducing the number of initial concurrent requests (combine files, use sprites, defer non-critical requests, implement lazy loading).

Possible Causes:

Server overload
Too many concurrent requests
Rate limiting
DDoS protection
Browser throttling
Limited browser resources

Aggregate Metrics

DOM Load Time: Transaction time from the beginning of the first object load to the end of the final object load.
Page Load Time: The time from the initial request to when the page is fully rendered. Redirect time is taken into account when determining total page load time.

Resources

Sample Questions

Your decision should be based exclusively on the exhibits presented.

3.3 Question 1

Review the exhibits. Based on the evidence, which action is most likely to solve the issue?

A) Modify the firewall rules to allow connections to the target domain
B) Modify the authentication credentials
C) Change the HTTP request method to PATCH
D) Modify the target URL to an available API endpoint

3.3 Question 2

Review the exhibits. Based on the evidence, what seems to be the underlying issue?

A) There is a network connectivity problem preventing us from reaching the target URL
B) One of the DOM elements cannot be found on the server
C) The request timed out waiting for the server to respond
D) There is a misconfiguration in the application server