Re: xpath, dom and multi threading

From:

Tom Anderson <twic@urchin.earth.li>

Newsgroups:

comp.lang.java.programmer

Date:

Tue, 18 May 2010 22:17:27 +0100

Message-ID:

<alpine.DEB.1.10.1005182204370.27344@urchin.earth.li>

On Tue, 18 May 2010, Daniel Pitts wrote:

On 5/16/2010 6:36 AM, FrenKy wrote:

can someone please suggest thread safe DOM implementation with support
for Xpath for reading XML files?

Or if someone has a good source for hints how to make some dom
implementation thread safe...

Thanks in advance!

XPath itself isn't multithread safe.

XPath is a language - threadsafety is not a property it can have or lack.
I presume what you mean is that the XPath implementation is not threadsafe
- but since we don't know what the implementation in use here is, that's
an interesting statement. I imagine you mean one or more of (a) the XPath
implementation is not required to be threadsafe, so one shouldn't build
software that requires it to be, (b) you (Daniel) know or strongly suspect
which implementation is in use, and know it isn't thread safe, (c) there
are no XPath implementations which are threadsafe, or (d) it is impossible
for there to be an XPath implementation which is threadsafe. Could you
elaborate?

If you don't have any concrete information about the threadsafety of your
XPath implementation, it might be worth doing some basic stuff to ward off
threading bugs. Make sure that there are memory barriers between the last
write to the DOM tree by any thread and the reads that all the worker
threads are doing. One way to do this would be for the workers to queue up
by calling await() on a CountDownLatch set up with a count of 1, which the
parser thread then releases by calling countDown() on the latch. If you do
that and still get problems, then you know that the XPath implementation
is mutating the heap even when doing read-only operations, at which point
it's probably safe to conclude that XPath isn't going to cut it for you.

Are you sure you need multithreading for your use-case? If you have
something that is that performance intensive, perhaps a different
approach is called for

Presumably, if he's throwing >100 CPUs at it, it's because doing it
singlethreaded would take too long.

But ...

(StAX/SAX based parsing of the XML file, Building
a domain object graph instead of a DOM, etc...)

This sounds like a good idea to me. A problem big enough to need >100 CPUs
working on it is big enough to be worth expressing in an efficient form -
i believe DOM implementations are generally deeply inefficient internally.
Lots of linked lists and other pessimicity. Your own model could be more
efficient, and also threadsafe (which after all is not hard to achieve for
read-only data).

tom

--
I KNOW WAHT IM TALKING ABOUT SO LISTAN UP AND LISTEN GOOD BECUASE ITS
TIEM TO DROP SOME SCIENTISTS ON YUO!!! -- Jeff K