Revolutionizing Perfomance Management

Dan Kuebrich

Subscribe to Dan Kuebrich: eMailAlertsEmail Alerts
Get Dan Kuebrich via: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn

Top Stories by Dan Kuebrich

Our fundamental unit of performance data is the trace, an incredibly rich view into the performance of an individual request moving through your web application. Given all this data and the diversity of the contents of any individual trace, it’s important to have an interface for understanding what exactly was going on when a request was served. How did it get handled? What parts were slow, and what parts were anomalous? Over the past year, the TraceView team has been listening to your thoughts on this topic as well as hatching some of our own. Today we get to share the fruit of our labors: Trace Details, redesigned. RUM, meet trace details. Trace details and RUM are old friends, so it’s no surprise they’re here together now.  But there are a few details that might be surprising to you: Using full-page caching (eg. Varnish, WP Super Cache, …)?  Now you can measure ... (more)

Performing Under Pressure | Part 1

Many types of performance problems can result from the load created by concurrent users of web applications, and all too often these scalability bottlenecks go undetected until the application has been deployed in production.  Load-testing, the generation of simulated user requests, is a great way to catch these types of issues before they get out of hand.  Last month I presented about load testing with Canonical's Corey Goldberg at the Boston Python Meetup last week and thought the topic deserved blog discussion as well. In this two-part series, I'll walk through generating lo... (more)

Performing Under Pressure | Part 2

In part 1 of this article, we covered writing web app load tests using multi-mechanize.  This post picks up where the other left off and will discuss how to gather interesting and actionable performance data from a load-test, using (of course) Traceview as an example. The big problem we had after writing load tests was that timing data gathered by multi-mechanize is inherently external to the application. This means it can tell us the response times of requests when the app is under load but doesn't identify bottlenecks or configuration problems. So we need to be gathering a bi... (more)

The Taming of the Queue

A few weeks back webserver request queueing came under heightened scrutiny as rapgenius blasted Heroku for not using as much autotune as promised in their “intelligent load balancing”. If you somehow missed the write-up (or response), check it out for its great simulations of load balancing strategies on Heroku. What if you’re not running on Heroku? Well, the same wisdom still applies – know your application’s load balancing and concurrency and measure its performance. Let’s explore how request queueing affects applications in the non-PaaS world and what you can do about it. Fu... (more)

Amazon Outage

You don’t have to be a pre-cog to find and deal with infrastructure and application problems; you just need good monitoring.  We had quite a day Monday during the EC2 EBS availability incident.  Thanks to some early alerts - which started coming in about 2.5 hours before AWS started reporting problems - our ops team was able to intervene and make sure that our customers’ data was safe and sound. I’ll start with screenshots of what we saw and experienced, then get into what metrics to watch and alert on in your environment, as well as how to do so in TraceView. 10:30 AM EST: Incr... (more)