How to verify, test and improve performance of a web site ahead of an upcoming, expected traffic peak load
The Swedish Emergency Management Agency – SEMA (Krisberedskapsmyndigheten (KBM), (http://www.krisberedskapsmyndigheten.se/) is a governmental body with the task to coordinate crisis preparedness in the Swedish society.
During the Spring of 2008, SEMA carried out a vast communications exercise called SAMÖ2008. The purpose of the exercise was to try the preparedness of financial systems against a possible IT-based attack. The exercise drew appox. 5000 participants from the Swedish Government, other central governmental bodies, the counties, organizations and companies as well as a large society network.
The subject of the exercise was organized IT-attacks which would threaten or bring down the financial systems and which could lead to a crises of confidence in the society; in a case like this, the traditional payment systems normal functions are affected by very severe distortions.
The exercise was a simulated crises scenario, which played out with the participants in the exercise and was headed up by SEMA, where approx. 100 participants were active in the so called counter scenario which brought the exercise game forward.
"Large communications exercises is an important means to strengthen the capacity to handle crises and to drill coordination between different sectors which work with crises preparedness in the society", said Helena Lindberg, leader of the exercise SAMÖ2008 and General Director of SEMA in a press release.
Technical challenge
The exercise had a central web site - www.samö2008.se – which delivered continuously updated information during the exercise for all participants. The performance of this site was of ultimate importance as the thousands of participants in the exercise periodically all logged in during a few minutes to update themselves on their exercise status.
The issues at hand for SEMA was, amongst others, “can the site handle 5000 logged on users during the one and same minute?” and “how should we best scale up our web application and server environment to be able to avoid bottlenecks?”.
The technical supervisor with SEMA, Per Söderström, designed together with Apica a load test of the production environment to verify the maximum performance of the site. The test was designed to simulate a scenario where 1,000 up to 10,000 users logs on to the site during the one and same minute and gets information from a number of sub-pages. Test results were aggregated regarding:
- No. of active users
- Data from server-CPU and web server
- Response times for the scenario as well as URL’s
Test results and actions
The result of the first test was that the site without any kind of trimming could cope with approx. 4000 concurrent logged in users. The response times were however to long.
This result did not certify that the site could live up to the expected size of traffic. One of the actions taken which had an impact was to make the size of the landing page smaller. But this was not sufficient.
The solution was to put a separate front-end cache before SAMÖ:s web site. This meant that the ordinary web site was offloaded all static traffic. For larger web-sites this is usually called a CDN (Content Delivery Network).
But why cannot the ordinary cache on a typical web server handle this?
The answer is that it depends on which kind of cache is implemented in the web application. A separate front end cache typically often delivers much better performance than the cache built in an ordinary web server. It is also very important that the no of inquiries per second to the web server goes down drastically, since the cache treats all static content. The CPU load is thus drastically lowered.
It is however worth mentioning that even if the content is flagged as cached in a web server, the actual no of hits per second is a separate problem when you reach high volumes of inquiries to the site.
It is impossible to have a general opinion on how a web cluster/web server will handle high load. The only way to be certain is to load test the production environment of the system and analyze how the separate components of the site reacts.
A front-end cache based on Varnish gives a much better throughput for static content than an equivalent web server. The most simple way of explaining it is that design and structure in the code is optimized specifically for delivery of images and not to generate complex web pages or all the rest of the functionality that comes with a modern web server.
On a source code level, Varnish is optimized to deliver maximum data per instruction. That type of optimization is impossible to achieve on a conventional web server.
How to verify, test and improve performance of a web site ahead of an upcoming, expected traffic peak load
The Swedish Emergency Management Agency – SEMA (Krisberedskapsmyndigheten (KBM), (http://www.krisberedskapsmyndigheten.se/) is a governmental body with the task to coordinate crisis preparedness in the Swedish society.
During the Spring of 2008, SEMA carried out a vast communications exercise called SAMÖ2008. The purpose of the exercise was to try the preparedness of financial systems against a possible IT-based attack. The exercise drew appox. 5000 participants from the Swedish Government, other central governmental bodies, the counties, organizations and companies as well as a large society network.
The subject of the exercise was organized IT-attacks which would threaten or bring down the financial systems and which could lead to a crises of confidence in the society; in a case like this, the traditional payment systems normal functions are affected by very severe distortions.
The exercise was a simulated crises scenario, which played out with the participants in the exercise and was headed up by SEMA, where approx. 100 participants were active in the so called counter scenario which brought the exercise game forward.
"Large communications exercises is an important means to strengthen the capacity to handle crises and to drill coordination between different sectors which work with crises preparedness in the society", said Helena Lindberg, leader of the exercise SAMÖ2008 and General Director of SEMA in a press release.
Technical challenge
The exercise had a central web site - www.samö2008.se – which delivered continuously updated information during the exercise for all participants. The performance of this site was of ultimate importance as the thousands of participants in the exercise periodically all logged in during a few minutes to update themselves on their exercise status.
The issues at hand for SEMA was, amongst others, “can the site handle 5000 logged on users during the one and same minute?” and “how should we best scale up our web application and server environment to be able to avoid bottlenecks?”.
The technical supervisor with SEMA, Per Söderström, designed together with Apica a load test of the production environment to verify the maximum performance of the site. The test was designed to simulate a scenario where 1,000 up to 10,000 users logs on to the site during the one and same minute and gets information from a number of sub-pages. Test results were aggregated regarding:
- No. of active users
- Data from server-CPU and web server
- Response times for the scenario as well as URL’s
Test results and actions
The result of the first test was that the site without any kind of trimming could cope with approx. 4000 concurrent logged in users. The response times were however to long.
This result did not certify that the site could live up to the expected size of traffic. One of the actions taken which had an impact was to make the size of the landing page smaller. But this was not sufficient.
The solution was to put a separate front-end cache before SAMÖ:s web site. This meant that the ordinary web site was offloaded all static traffic. For larger web-sites this is usually called a CDN (Content Delivery Network).
But why cannot the ordinary cache on a typical web server handle this?
The answer is that it depends on which kind of cache is implemented in the web application. A separate front end cache typically often delivers much better performance than the cache built in an ordinary web server. It is also very important that the no of inquiries per second to the web server goes down drastically, since the cache treats all static content. The CPU load is thus drastically lowered.
It is however worth mentioning that even if the content is flagged as cached in a web server, the actual no of hits per second is a separate problem when you reach high volumes of inquiries to the site.
It is impossible to have a general opinion on how a web cluster/web server will handle high load. The only way to be certain is to load test the production environment of the system and analyze how the separate components of the site reacts.
A front-end cache based on Varnish gives a much better throughput for static content than an equivalent web server. The most simple way of explaining it is that design and structure in the code is optimized specifically for delivery of images and not to generate complex web pages or all the rest of the functionality that comes with a modern web server.
On a source code level, Varnish is optimized to deliver maximum data per instruction. That type of optimization is impossible to achieve on a conventional web server.