Ping Jim Janney - sizing your snippet store

Tom Anderson <>
Fri, 2 Apr 2010 19:27:45 +0100

Since i spouted off on cljp about how key-value stores were better than
RDBMs, i thought i ought to put my money where my mouth is. I'm going to
put together a simple demo/benchmark for storing key-value data,
implemented on top of Tokyo Cabinet and JDBC, which will hopefully
demonstrate that TC is easier to use and faster than a database (i can try
Derby, H2 and a leading commercial database whose license prohibits the
posting of performance comparisons, and so which will have to go nameless,
lest the bearded, megalomaniacal CEO of its manufacturer get upset).

So, it would be good to know what sort of problem i'm actually trying to
show that KV stores are good for. Jim, you said:

I need to maintain a data base of small text snippets keyed by arbitrary
strings, without the overhead of a full SQL relational database. We
will have several people putting data into it so it needs to support
concurrent access over a network.

Could i trouble you to expand a little on sizing? In particular:

- How many entries are there?
- How big are the keys? What sort of things are they?
- How big are the values?
- What's the workload mix? Read vs write vs delete?
- Are writes mostly of new entries, or overwrites of existing ones?
- How skewed is the access towards a few hot entries?
- How many users are using it in parallel?
- What is the request rate like?
- How much RAM can be used? How much disk space?

Answers to any of those, no matter how rough, would be useful.

I'll post to cljp if/when i get this done.


