Security Laboratory

Security Laboratory

Auditing for Availability With a Web Based Service

By Stephen Northcutt

Availability is the most important IT business requirement.

Many business leaders feel that technical security people do not "get it" when it comes to the needs of the business. To some extent this is fair criticism; as a community, this is something we need to work on and it is one of the primary goals of the Leadership Laboratory. However, availability, the most important business requirement for IT, is something every information security student is taught as lesson one and is part of the security triad along with confidentiality and integrity. In this article we will open with a famous example of availability failure, the 1999 Victoria's Secret webcast, consider the business ramifications, look at resources for auditing for availability and end with a brief discussion of autonomic computing, which may well be the future of IT availability.

Death by Success

"In January 1999, Victoria's Secret aired its first-ever Super Bowl commercial, announcing a live Webcast of its Spring Fashion Show. The ad generated millions of hits on within minutes, and the live Webcast drew a record-breaking 1.5 million visitors worldwide. Success on this scale posed significant technical challenges, and many potential customers were unable to participate due to the site's inability to support the traffic bursts."[1] I remember this event like it was yesterday, similar things on a smaller scale had happened to SANS. On one of our most popular vendor webcasts, also in 1999, we ran out of file handles, frustrating several thousand users. In Victoria's case it was worse. A year and a half ago, the retailer of lingerie and swimsuits used an ad during the telecast of the Super Bowl to promote a simultaneous Webcast fashion show. More than 1 million people attempted to log on to the show, swamping the site, slowing response time, and leaving many frustrated visitors viewing only error messages. "We had no idea that many people would leave the Super Bowl last year and go to the Internet and," says Timothy Plzak, director of advanced technology for Intimate Brands Inc., the parent company of Victoria's Secret in Columbus, Ohio.[2] They did not give up, but rather enlisted the best in industry, including Akamai, to develop the capacity to make this work on an unprecedented scale. However, the famous story serves to remind leaders to press their people to audit for availability.

GIAC GSNA Practical by John Soltys

There are actually very few references on auditing for availability. A search of google books did not return any solid hits.[3] One reference on the subject is a GIAC computer security certification research paper by John Soltys.[4] The paper uses an example of a newpaper classified ad server. "During times of high load each of these impacts can result in a loss of the reader's trust. In a time when the newspaper industry is constantly under siege by a dizzying array of new voices maintaining a close relationship with the reader is crucial to the survival of the company. If readers can't get their news from they're only a few keystrokes away from another news site. How many times will it take before a reader starts at that other news site before it tries to visit and that business is essentially lost? In order to maintain the newspaper's readers and truly serve the community each of these impacts must be fully understood and the vulnerabilities mitigated as appropriate."[4] He considers three primary cases:
  • Reduction of capacity of core content
  • Denial of Service due to non-throttled programs
  • Impact on other company applications

This really is a good point, Internet users are fickle and will go elsewhere. "The acceptable time [for a web page to load] is down to four seconds--or less. "Eight seconds was kind of the threshold, but that's a ridiculous notion now," says Forrester Research senior analyst Joe Butt. "The real expectations are around four seconds, no matter what kind of connection you're on."[2] Because this was early in the history of webcasts the public was pretty forgiving, but the stakes were high. "The company's Class A stock spiked 10 percent the day of the Webcast. Full-page print ads featuring models in underwear slunk into daily and weekly publications, and a curtain-raiser Web page featuring Tyra Banks reaped 400,000 registrations for the event. Victoria's Secret itself did its usual PR blitz--one model even rang the bell for the opening of the New York Stock Exchange the day of the show. The ultimate goal was to drive traffic to the site, an e-commerce venture that launched late last year.[5]

Optimize your most popular content

One of the truths of the publishing business is that a few titles are very successful and the help float the remainder of the titles. The Pareto principle, or 80/20 rule applies to the content on your web site as well. 80 percent of your shoppers will probably focus on 20 percent of your content, there are technical tricks including static pages, proxy and caching, as well as web and proxy settings that can increase your capacity. You may be able to achieve a huge performance win if you can display your pages in highest demand as static pages, consider this section from the Soltys paper. "The site is built almost exclusively from static files. In other words, pages are assembled before they are made available through the newspaper's web server. This is in stark contrast to the concept of a dynamically generated website that must run a program for each page it serves. Static pages are inherently faster and less resource-intensive than dynamic pages. The proxy servers bring a measure of dynamism by combining static files before serving the results to the end user. While a purely static website promises extremely high capacity there are some business functions that cannot be accomplished with unchanging files. To fulfill these requirements a small number of programs run on the newspaper's web servers."[4] In addition to static files, a proxy can help increase performance, in cases like the Victoria's Secret webcast this becomes mandatory, the first reference [1] is an Akamai case study. Victoria's second webcast was Akamai enabled ( among other things) and was successful. However, you don't just want to display content, you want to sell product and that requires PHP, or .NET ASP and interactivity with the user. Now, the overall web experience is composed of items that load at different speeds, very fast static items and much slower dynamic items. This is also discussed in the Soltys paper. "This feature allowed an unprecedented ability to collaborate between the two sites and soon its use was widespread. Unfortunately, such collaboration introduced a vulnerability previously unknown. If the page that contained the proxy:include statement had a capacity of 1,000 pages/second, but the included file had a capacity of only 100 pages/second the effective capacity of the overall page was reduced to 100 pages/second. All the work that had gone into increasing the capacity of the core news content had suddenly been tied to the capacity of a site never meant to handle high traffic volumes."[4] To mitigate the problem, the paper goes into the technical options related to web page and proxy settings.

Denial of Service

Intentional Denial of Service is a cause for loss of availability that both management and technical security people should understand. It is the hardest case to manage. If you are trying to convince management this is something they should take seriously, one of the most compelling stories was run by CSO magazine. Here is how the article begins: "The e-mail began, "Your site is under attack," and it gave Mickey Richardson two choices: "You can send us $40K by Western Union [and] your site will be protected not just this weekend but for the next 12 months," or, "If you choose not to will be under attack each weekend for the next 20 weeks, or until you close your doors."[6] Another great story was written by Steve Gibson and is available on his website, here is a teaser: "Within a minute of the start of the first attack it was clear that we were experiencing a "packet flooding" attack of some sort. A quick query of our Cisco router showed that both of our two T1 trunk interfaces to the Internet were receiving some sort of traffic at their maximum 1.54 megabit rate, while our outbound traffic had fallen to nearly zero, presumably because valid inbound traffic was no longer able to reach our server. We found ourselves in the situation that coined the term: Our site's users were being denied our services."[7]

Tools to consider:

At SANS we run Nagios, an enterprise-class monitoring solution for hosts, services, and networks, released under an Open Source license and we are very happy with it:

I don't have personal experience with the links below, if you do, give me a shout, if you have other tools, drop me a note!

AlertSite is a leading provider of web performance measurement, systems monitoring and security vulnerability scanning products that ensure a customer's critical web-based services are always available and running at peak performance.

Do you become aware of problems with your site only from the reaction of indignant clients or extreme drop in sales? With the HostTracker you are the first to know about any fault that may occur!

We monitor the response time of your business, from the major applications, to your websites, email, database, and FTP.

Could the future for availability be autonomic computing?

Autonomic Computing is an initiative started by IBM in 2001. Its ultimate aim is to create self-managing computer systems to overcome their rapidly growing complexity and to enable their further growth. "In 'The Vision of Autonomic Computing', Kephart and Chess warn that the dream of interconnectivity of computing systems and devices could become the "nightmare of pervasive computing" in which architects are unable to anticipate, design and maintain the complexity of interactions. They state the essence of autonomic computing is system self-management, freeing administrators of low-level task management whilst delivering an optimized system."[9] Pervasive computing is also called ubiquitous computing, it means being totally dependent of computers as part of our daily life. If they crashed and malfunctioned as much as they do today, we would all die, which is the driver behind autonomic computing. According to IBM there are 8 primary characteristics of autonomic computing[10]:
  • Needs to "know itself"
  • Must configure and reconfigure itself under varying (and in the future, even unpredictable) conditions.
  • Looks for ways to optimize its workings.
  • Must be able to recover from routine and extraordinary events that might cause some of its parts to malfunction.
  • Must be an expert in self-protection. It must detect, identify and protect itself against various types of attacks to maintain overall system security and integrity.
  • Must know its environment and the context surrounding its activity, and act accordingly.
  • While independent in its ability to manage itself, it must function in a heterogeneous world and implement open standards.
  • Will anticipate the optimized resources needed while keeping its complexity hidden. It must marshal I/T resources to shrink the gap between the business or personal goals of the user, and the I/T implementation necessary to achieve those goals.

Intel and IBM both have purchased Google Adwords for "Autonomic Computing", however Intel calls it essential computing and availability is not one of the primary threads. IBM, on the other hands, has availability up front and manages some of the largest web events in the world including the Tony awards. In fact, one of their white papers is titled: IBM delivers zero downtime for high-profile events with a virtualized, self-managing autonomic infrastructure.[11] We need your help, if you have personal experience with techniques that work to test for availability or links to valuable information, please drop me a line,