TechTips: Designing high-performance server applications

In today's world, it's quite common to have hundreds of users on the
local LAN, or maybe thousands of users on the internet, all trying to
zoom in on -your- application service all at the same time.  The tried
and true methods that worked so well in "single user" applications start
to crumble under those titanic loads.

The key to understanding "what's causing it to crumble" is to focus on
those words, "all at the same time."  Very interesting -- and awful --
things start to happen when an application starts to exhibit the
behavior we call "thrashing."  And, applications begin to exhibit this
behavior, NOT "gradually," but "all at once!"  Let me explain.

Let's make up some numbers.  Let's say that whatever your application is
doing requires 1/1000th of a second, one millisecond, to complete under
ideal conditions.  This means that, if you had 1,000 of these operations
to do, it would take one second clock-time to finish them all.

Now, if you look more closely at that millisecond, you may find that the
computer issues a disk-read request to the hard drive (which takes some
fraction of that millisecond to complete), then does some amount of
processing (a few microseconds) and delivers the answer to you.  

The CPU is actually idle -- with nothing productive to do -- while the
disk drive is servicing its part of the request.  But the disk drive
cannot service more than one request at a time.  The CPU can be diverted
to other more-productive things (and Windows will do that automatically
by dispatching another thread), but it won't do any good if the CPU is
diverted to doing something that will involve another I/O request.

If the CPU spends its part of the millisecond flittering among 100
different threads, and those threads all issue an I/O request to the
disk drive and wait for it to complete ... THEN you have a serious
problem that will cause workload to pile up in a great big hurry, and
will cause all of those workloads to receive very erratic, very poor

The problem is that you now have 100 threads "waiting for I/O," and you
can do the math.  Each one of them will wait for between 1 and 100
milliseconds and they are all waiting on the same physical resource:
the disk drive.  Their actual wait-times vary across an enormous range,
depending on exactly when Windows decided to dispatch the thread.  And
meanwhile, the CPU is spinning its wheels.

The whole application is said to be "thrashing" now because it's
spending more time waiting for things to happen -- in what appears to be
a very random and unpredictable fashion -- because too many (or "all")
of the requests are waiting for some thing that cannot run any faster
even if you'd want it to:  the disk.

In the design of what appears to be a high "performance" application,
the most important issue really isn't the "performance" of the system
(the time it takes to complete a request under ideal conditions) but the
"consistency" of the application's "throughput" as things become more
hectic.  If various service-requests =consistently= receive enough
attention from the system to complete in a =predictable= amount of time
(albeit slower than it would perform if the competing load did not
exist, of course), then the system is still "high performance."  But if
the system starts to crumble .. if the time required to finish a given
service-request starts to become very unpredictable .. performance is
perceived as falling-apart and your Web users start to click elsewhere.

To control this problem, you need something in your application that
every Briggs & Stratton engine your lawnmower ever saw, had:  "a
throttle."  :-O  Or perhaps, in steam-engine terminology, a "governor."

A throttling or governor mechanism is a software "feedback loop" that
monitors the system for "hot spots" -- bottlenecks -- and exercises
control upon the system by limiting the workload that, at any one time,
the entire system *attempts* to do.  Instead of throwing 100
simultaneous work-requests upon the disk-drive, the throttle may force
the system to issue no more than 10 work-requests at a time against that
particular device, holding the other 90 requests back so that, at any
point in time, no more than 10 requests are active against that device
at one time.

A more sophisticated throttle may, like a good manager at a McDonald's
restauraunt, gather statistics about the performance of other parts of
the system and bring more servers (employees) on-line or off-line to
regulate resources against demand.  None of this activity will cause
requests to be completed any faster (this is usually determined by fixed
physical characteristics of the hardware involved), but it will cause
the behavior of the system to remain =consistent= under varying load

An effective load-management or throttling system must also clearly
distinguish between "the employees of the restauraunt" and "the orders
that are coming in."  The two are not the same.  Many simple-minded
servers simply spawn a new thread each time a new user-request comes in,
and rely upon Windows to let those various threads "duke it out."  Such
systems invariably crumble under load.

Threads should correspond to =workers= or =employees= .. not the orders.
If too many orders (requests for work) flood into the system, the
throttle should be there to keep the system from falling into erratic
chaos.  "Beyond a certain number of orders per second, -you-, dude, are
simply going to have to wait."  The capacity of the system to perform
work under maximum-load conditions can be calculated; and beyond that
it's "just simple math" =IF= the load is regulated such that it cannot
exceed that agreed-upon maximum value... =IF= you know that all the
other work must wait, and will, by the throttle, be forced to wait.

The throttling and load-balancing mechanism that is needed in any high
performance server stands as a gatekeeper to regulate how much work the
server =attempts= to execute =simultaneously= at any point in time.  It
should be both adjustable, and self-adjusting.

It is the systems that rely upon "an unlimited number of threads, just
duking it out" that tend to fail .. spectacularly, in-production, at the
worst possible time, with nothing-to-be-done-about-it ...

Sundial Services :: Scottsdale, AZ (USA) :: (480) 946-8259  (PGP public key available.)

> Fast(!), automatic table-repair with two clicks of the mouse!
> ChimneySweep(R):  "Click click, it's fixed!" {tm}