Re: Advice/Help with Multithreading
DyslexicAnaboko wrote:
I wrote a method that will take a URL, and return its page in String
form.
Now depending on which webpage is being visited is how long it will
take to download its contents. There is a difference between getting
the contents of google vs. yahoo, obviously the page sizes differ.
Since I would have many pages to download, downloading them 1 at a time
takes forever. I just want to speed things up. I figured that
multithreading would be my answer since I could create several threads
to download pages simultaneously. I am inexperienced with
multithreading though, so I was just hoping that anyone could give me
some pointers or advice on where to begin.
Basically I want to do the following:
1. I want to create X threads, lets just say 10 for arguments sake.
2. I want each thread to get its own assigned URL. Will there be a
problem with more than one thread accessing the same method?
3. After downloading the contents of the page I intend to put the
strings into a list. Will there be a problem with more than one thread
accessing the same object? If so, should I use semaphores?
I'm not asking anyone to write this for me, I just don't know where to
begin. If anyone can spare an example or any advice I am all ears.
Thanks,
Eli
Look at the java.util.concurrent package, it has helpful classes for
almost everything you're asking about.
<http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/package-summary.html>
Specifically ThreadPoolExecutor, and BlockingQueue.
You can submit download requests to the executor, and have them stuff
the results into the blocking queue. You would have one or more
seperate thread reading from the blocking queue and processing the
results. If you want all the results to end up in one List, then you
either need to syncronize on that list, or have only one thread reading
from the BlockingQueue and writing to the list.
If you are writing a Spider (or Robot, or whatever)... Be sure to
follow good netiquette and respect robots.txt
<http://www.robotstxt.org/>