Re: Millions of Threads ?

From:

Thomas Hawtin <usenet@tackline.plus.com>

Newsgroups:

comp.lang.java.programmer

Date:

Sat, 26 Aug 2006 14:17:14 +0100

Message-ID:

<44f049c8$0$3211$ed2619ec@ptn-nntp-reader01.plus.net>

frankgerlach@gmail.com wrote:

I am thinking about a telecom application, which would potentially
handle millions of mobile
phones (J2ME) as clients. Of course, I need a server (J2SE), too.
The "easy" implementation uses TCP connections for the client/server
communication. Problem is that there are only 65000 sockets per IP
address of the server. I think I could solve that by configuring
multiple IP addresses per network card.

That my be an OS problem, but it shouldn't be a problem for TCP or UDP.
Each connection is identified by four number: client IP address, client
port, server IP address and server port. Even with the last two
constant, you should get at least 32768 connection per client. Should be
enough.

Having said that, if all the connections go through the same opaque
proxy (or you try to a 1:1 mapping to back-ed app server or database),
then you could cluster onto a single client IP address. I really don't
know how mobile gateways operate.

Still, two problems remain: Memory used by each TCP connection and by
the enormous number of threads (each client would have a server thread
for the "easy" implementation)

If you want a million simultaneous connections, then ouch.

It's an area that has moved on a great deal over the last few years. So
a lot of what you read will be out of date.

The killer for threads is the amount of virtual address space used.
Therefore stick to 64-bit operating systems (shouldn't be a problem
these days). To handle large numbers of threads, OS have moved to
scalable algorithms. On Linux use a 2.6 rather than 2.4 kernel. I would
check what Solaris 10 x64 can do, but my Ultra 20 is on the blink.

I guess large numbers of TCP connections will consume a lot of memory.
Perhaps turning the window size will help (a large window helps
throughput with long latencies).

Because of all those issues I am considering the use of datagram
sockets and state machines (one per client) instead of one thread per
client. On the other hand, what is the difference between a state
machine called "Thread" and a "hand-crafted" state machine ? Both
consume memory, and maybe I could configure the JVM to allocate very
little memory per Thread.....

UDP does give you advantages in terms of not having to hold connections
and ability to work around TCP latency issues. Using UDP you will have
to reinvent a lot of TCP. Perhaps TCP and NIO would be better.

Probably you are best off using someone else's infrastructure. Here's
where I don't know much about what is available. IIRC, Grizzly and Ember
are two abstractions of NIO. Sun's Project Darkstar is a server that
handles problems such as fail-over. If you've got millions of customers,
you are probably going to want to make it more reliable than can
reasonably be achieved with one machine.

Tom Hawtin
--
Unemployed English Java programmer
http://jroller.com/page/tackline/