Re: Pattern suggestion
On 4/15/2012 10:11 AM, FrenKy wrote:
I have a huge file (~10GB) which I'm reading line by line. Each line has
to be analyzed by many number of different analyzers. The problem I have
is that to make it at least a bit performance optimized due to sometimes
time consuming processing (usually because of delays due to external
interfaces) i would need to make it heavily multithreaded.
File should be read only once to reduce IO on disks.
So I need "1 driver to many workers" pattern where workers are
multithreaded.
I have a solution now based on Observable/Observer that I use (and it
works) but I'm not sure if it is the best way.
As I see it then you need 3 things:
* A single reader thread. That is relative simple just be sure to
read big chunks of data
* N threads doing M analysis's. There are various ways of doing this.
Manually started threads and thread pool. I think the best choice
between those will depend on the solution for the next bullet.
* A way of moving data data from the reader to M analyzers.
The first two solutions that come to my mind are:
A1) Use a single java.util.concurrent blocking queue, use
a custom thread pool, use command pattern, have
the reader put M commands on the queue containing the
same data and the analysis to perform, the N threads
read the commands from the queue and analyze as instructed.
A2) Use the standard ExecutorService thread pool, use command
pattern, have the reader submit M commands that are also tasks
to the executor containing the same data and the analysis
to perform, the N threads read the commands from the queue
and analyze as instructed.
(A1 and A2 are really the same solution just slightingly different
implementation)
B) Use non persistent message queue and JMS, use publish subscribe
pattern, have the reader publish the data to the queue, have a
multipla of M custom treads each implementing a single analysis
subscribing to the queue, reading and analyzing.
A has less overhead than B. A is more efficient than B if some
analysis's take longer time than others.
But B can be used in a clustered approach.
(I guess you could do A3 with commands on a message queue and
a thread pool on each cluster member as well)
Arne