Effects of Linux Context Switching on High Performance Web Applications

By Robert Dominy – Engineering Director – ADP Cobalt Display Advertising Platform

If you are writing a high performance web application and attempting to evaluate performance, one thing you should be aware of is the effect of Linux context switching.  Under Linux, the system scheduler allocates time slices to running processes and as processes exceed their allocated slice, they can be interrupted to give time to other processes.  These context switches can be expensive when you measuring performance in the tens of milliseconds or less.

On a production server that is moderately busy (handling about 100 HTTP requests/sec), a series of timing tests were conducted.  The server is a virtualized system running CentOS 6.2 with 4 CPUs and 32GB of RAM allocated to the virtual.  It runs at loads ranging from about 0.5 to 1.5.

A simple loop was implemented in PHP where the timed task was basically this:

$limit = 7000;
$iterations = 0;
while ($iterations < $limit)

Here is a histogram of the timed results (in milliseconds):
0-10 ms : 0
11-18 ms : 1294
19-24 ms : 373
25-50 ms : 152
51-75 ms : 6

Nominally, the test completes in about 14ms and yet why are there cases as high as 65ms?

Measuring Context Switches
The Linux function getrusage returns different metrics about process resource usage.  The items we are interested in are user CPU time (ru_utime), system CPU time  (ru_stime) and involuntary context switches (ru_nivcsw).

Here are the stats for a sample that took 65ms:
[elapsed] => 65ms
[userCPUTime] => 8.999ms
[systemCPUTime] => 9.999ms
[switches] => 7

Combining the user and system CPU time, the process consumed about 19ms.  During this time there were 7 context switches where the process was interrupted and other processes were allowed to run.  Those interruptions added another 46ms to the completion of the test.

Another important stat to monitor is voluntary context switches (ru_nvcsw).  Voluntary context switches can occur when your code calls various operating system functions, such as the sleep function or I/O functions.

A basic PHP class that runs as a command line script can be found here: https://gist.github.com/rdominy/6557280

Server Activity
Running on a machine that has less activity will result in significantly less context switches.  This is often the case when running and testing machines in a development environment and then later testing them in production under real load.  As an example, compare the results of a production server vs. a development laptop:

Time Range Development Production
11 – 18 ms 0.0% 90.6%
19 – 24 ms 0.4% 7.6%
25 – 50 ms 99.6% 1.8%

Even though the production server is significantly faster, it has much more variation in times due to context switching.  Even though the CPU load between the systems is similar, the production server is handling hundreds of requests per second causing NIC interrupts and other CPU activity.

Memory Footprint
The amount of memory can also impact the cost of context switching.  Here are three tests running with foot prints of 1MB, 500MB and 900MB:

Time Range 1 MB 500MB 900MB
11 – 18 ms 84.6% 0.0% 0.0%
19 – 24 ms 12.1% 68.5% 70.2%
25 – 50 ms 3.2% 30.5% 29.3%
51 – 75 ms 0.1% 0.9% 0.4%
76 – 100 ms 0.0% 0.2% 0.1%

Simply having a larger process size increased the time of the same test by almost twofold.

The elapsed time for these tests uses the PHP microtime function.  It is subject to limitations of floating point accuracy, clock drift caused by NTP corrections and probably several other things I am not considering.  I did monitor the NTP logs and averaged about 1 NTP correction hour, so NTP corrections were not likely influential on the test results.

Additional tests of varying complexity, typically running a real application algorithm, yielded similar results. Ultimately the slower your application is and the more memory it uses, the more costly it is for context switching.  Super fast sub-millisecond operations in processes with a small footprint will have much less interruptions than functions that take tens of milliseconds to complete.

Future posts will look at ways of mitigating these costs.

About collectivegenius
Everyone has a voice and great ideas come from anyone. At Cobalt, we call it the collective genius. When technical depth and passion meets market opportunity, the collective genius is bringing it’s best to the table and our customers win.

One Response to Effects of Linux Context Switching on High Performance Web Applications

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: