Re: The future of Java
On Sun, 22 Nov 2009, Martin Gregorie wrote:
On Sun, 22 Nov 2009 15:10:32 -0800, Roedy Green wrote:
On Sat, 21 Nov 2009 18:59:04 +0000 (UTC), Martin Gregorie
<martin@address-in-sig.invalid> wrote, quoted or indirectly quoted
someone who said :
How do you rate H2 against Derby and HSQL?
For a comparison of PosGreSQL and MySQL see
http://mindprod.com/jgloss/postgresql.html
The information is a few years old.
I understand that Derby is fast, but ram resident only. It is for small
databases only.
I'm interested finding out if Derby, H2 or HSQL could be used as a
lighter weight, portable alternative to PostgreSQL, i.e. minimal changes
to SQL queries and schema, full ACID capability, capable of dealing with
fairly large tables on disk.
One thing that leaped out at me when reading the H2 docs is that it uses
table-level rather than row-level locking. For read-heavy tables, this is
fine, because read locks can be shared, but it means that if you have
multiple threads writing to a single table, even if they're touching
entirely different rows, they'll serialise, which will kill concurrency
and throw its body in a ditch.
I have had serious problems with table-level locking in real systems. We
have an app which occasionally loads huge (well, moderate) amounts of data
into a database, doing so via a process with quite a bit of CPU overhead.
The machines running the app and databases have multiple cores, and
there's fast disk (actually a SAN) at the backend, so we use multiple
threads to get the most out of the hardware. We developed with Oracle as
our RDBMS, and it all went swimmingly. We ran it up on a client's test
farm with MS SQL Server as the database, and it fell flat on its face.
Turns out that unlike Oracle, MSSQL uses table-level locking by default.
The work around was to drop the number of threads used to one, and endure
amazingly long load times. A later solution was to enable row-level
locking in MSSQL, although to be honest, even then it crawled compared to
Oracle. We never (or haven't yet, more optimistically) got to the bottom
of that.
Hmm. A bit of googling suggests that MSSQL uses row-level locking by
default, which doesn't fit this story. It's possible an overenthusiastic
DBA had changed this somehow. Or that we were running into lock escalation
- each transaction inserted many rows, so it's possible each one started
out with row locks, but then decided to escalate to a whole-table lock.
As an aside, part of the moral here is to develop or at least
integration-test on the same infrastructure as you'll be deploying to. My
defence here is that we originally developed the app for client A, who
used Oracle, and only got involved with client B's MSSQL setup, where the
above story happened, later.
Coming back to H2, it does also support an entirely lock-free concurrency
strategy in the shape of multi-version concurrency control (as made famous
by Interbase/Firebird), but this is experimental, and currently
single-threaded, which means it's not a solution in situations like the
above.
A bit of reading reveals that other databases which use MVCC, beside H2
and Firebird, are Oracle (which explains why it worked where MSSQL
didn't!), PostgreSQL, MS SQL Server since 2005 (as an option, though?),
and even MySQL with the right storage engine. I've seen Sybase mentioned
as having MVCC, but googling turned up strong indications that it uses
orthodox locking. So, MVCC is used by everything except that, the
lightweight java databases, and, er, DB2!
A final caveat on that postscript to a caveat: MVCC is not a cure-all,
because it introduces certain kinds of overhead that lock-based systems
don't have. For instance, i have read that SELECT COUNT(*) FROM SOME_TABLE
is O(n) in an MVCC system, because it has to look at version numbers on
the rows, whereas a locking system can do it in O(1) by looking at the
metadata.
tom
--
1 p4WN 3v3Ry+h1n G!!!