Re: Data Storage Issue (Basic Issue)
On Fri, 9 May 2008, Srini wrote:
Which is more efficient? Why?? Storing or Reading a file in database or
on to disk in a web server environment? considering all things like
network, connections, memory are ideal.
My knowledge says that using database is more efficient but few my
pierce argue that storing in disk is more efficient when concurrent
people accessing it...
Databases typically store their data in a file [1]. That means that a
file-based solution can always be at least as fast as using a database,
because it can just do what the database does.
The problem is that to make a file-based solution that's as fast as a good
database and also provides things like transactionality, you may have to
write code that's as complex as a database. Which is not good.
If you're mostly reading your data, so you don't have to worry about
concurrency and transactionality, and you have a straightforward
organisation (like having fixed-size records which you can refer to by
index in a sequence), then you can write a simple file-based
implementation that should be faster than a database, because it avoids
the overhead and complexity.
There's nothing in the java libraries, that i'm aware of, for doing this
kind of non-database structured file access. There are things for some
texual formats, like XML and properties files (remember those?!), but
nothing like DBM or COBOL's record-oriented files. There are third-party
libraries, though - see Berkeley DB, Java Edition:
http://www.oracle.com/technology/products/berkeley-db/je/index.html
and JDBM:
http://jdbm.sourceforge.net/
It's also not that hard to write your own fixed-size record manager, and
not that hard to layer variable-sized records on top of such a thing.
There's also an excellent trick for using the unix filesystem as a
database by storing data in symbolic links: the path of a symlink is
actually an arbitrary text string, so you can store information, rather
than an actual path, in it. Gives you hierarchically organised,
string-keyed records of up to a kilobyte (YMMV) without any actual file
IO!
A performance question for the wise: i hacked up a little fixed-size
record manager, and wrote two backends, one using RandomAccessFile, and
one using a NIO MappedByteBuffer. For both, i provided a way to flush to
disk after each write - with RandomAccessFile, via getFD().sync(), and
with MappedByteBuffer with force(). Timings to do a batch of reads and
writes (100 000 operations, 75% reads, on 10 000 records of 256 bytes
each; a different random pattern each time, on a machine doing nothing but
this and playing MP3s):
Implementation Flush? Time (ms)
RandomAccessFile no 733
RandomAccessFile yes 20659
MappedByteBuffer no 63
MappedByteBuffer yes 33087
The mapped file is an order of magnitude faster without flushing, but 50%
slower with. Any idea why?
tom
[1] Okay, so seriously heavyweight ones use disk extents/partitions and
bypass the filesystem; how much of a difference does that make?
--
.... but when you spin it it looks like a dancing foetus!