Re: Thread Pool versus Dedicated Threads

From:

"Chris M. Thomasson" <no@spam.invalid>

Newsgroups:

comp.lang.c++

Date:

Sat, 16 Aug 2008 20:10:17 -0700

Message-ID:

<lAMpk.5327$dB6.2303@newsfe01.iad>

"gpderetta" <gpderetta@gmail.com> wrote in message
news:5bd344ed-1934-46fa-911e-395d8313bd21@59g2000hsb.googlegroups.com...

On Aug 16, 7:47 am, "Chris M. Thomasson" <n...@spam.invalid> wrote:

"Ian Collins" <ian-n...@hotmail.com> wrote in message

news:6gn25jFgme99U12@mid.individual.net...

Chris M. Thomasson wrote:

"Chris Becke" <chris.be...@gmail.com> wrote:

Microsoft Windows needs to allocate stack space for each thread
created. On the 32bit version of >the OS then, this means an
immediately scalibility problem :- with only 2Gb of address space per

process, this implies a hard limit of 2048 connections (threads) per

server. Even on a 64bit OS >the working set added to the process for
each thread means that phsyical hardware limits will be >reached that
much faster than a system that uses asynchronous IO to keep lots of
connections on >one thread.

I have personally created IOCP servers on Windows which can handle
__well__ over 40,000 connections; want some tips?

But I'd bet several gallons for my favourite beer that you didn't
create
40,000 threads!

I only created around 2 * N threads for the IOCP treading pool, where N
is
the number of processors in the system. I did create a couple of more
threads whose only job was to perform some resource maintenance tasks...

The one thread per connection model simply isn't scalable beyond a
handful of threads per core.

Right. Well, I guess you could use one user-thread (e.g. fiber)
per-connection and implement your own scheduler. The question is why in
the
world would you do that on Windows when there is the wonderful and
scalable
IOCP mechanism to work with...

You can of course use user-threads on top of IOCP and get the best of
both worlds.

Sure. I guess you would use an IOCP thread as the actual scheduler for the
fibers within it. When an IO completeion is encountered, you extract the
fiber context from the completeion key and simply switch to that fiber. When
the fiber does its thing, it switches back to the IOCP thread. Something
like:

// pseudo-code

struct per_io {
  OVERLAPPED ol;
  char buf[1024];
  DWORD bytes;
  int action;
  BOOL status;
};

struct per_socket {
  SOCKET sck;
  void* fiber_socket_context;
  void* fiber_iocp_context;
  struct per_io* active_io;
};

DWORD WINAPI iocp_entry(LPVOID state) {
  for (;;) {
    struct per_io* pio = NULL;
    struct per_socket* psck = NULL;
    DWORD bytes = 0;
    BOOL status = GQCS(...,
                       &bytes,
                       ...,
                       (LPOVERLAPPED)&pio,
                       (PULONG_PTR)&psck,
                       INFINITE);
    pio->status = status;
    psck->active_io = pio;
    SwitchToFiber(psck->fiber_socket_context);
  }
  return 0;
}

VOID WINAPI per_socket_entry(LPVOID state) {
  struct per_socket* const _this = state;
  for (;;) {
    struct per_io* const pio = _this->active_io;
    switch (pio->action) {
      case ACTION_RECV:
        [...];
        break;
      case ACTION_SEND:
        [...];

      [whatever...];
    }
  }
}

BTW, a good reference on the topic of (web) server scalability:

http://www.kegel.com/c10k.html

(I guess many here know this page).

Indeed.