Re: Smartest way of compressing numbers over a webservice?
"Casper" <casper@jbr.dk> wrote in message
news:qNXTh.28730$O_6.273105@weber.videotron.net...
[Eric Sosman wrote:]
GZIP will almost certainly do better. Huffman encoding is
only an encoding, and does not in itself embody a probability
model, a.k.a. a "predictor" of the future stream.
[Snipped example of GZIP doing better than Huffman.]
Interesting. I hardly remember the Huffman algorithm but it was
suggested to me on com.compression (after making this initial post).
Huffman was one of a couple of suggestions you got there. I think the
Huffman recommendation was meant as a starting point from which you could
further build your own scheme.
After reading your description of the data (temperature readings over
a 24-hour period, and thus like a sine-wave, with occasional spikes), I
think an effective compression scheme would be to encode the sine wave
itself (just encode the amplitude, x-offset and y-offset; no need ot
encode frequency, as it can be assumed to be 1/24 hours), and then encode
any deviations that the real data has, as compared to this ideal sine
wave.
A simple scheme: for encoding the deviations, you'd have your
predictor such that 0 is the most likely deviation (and thus takes the
least bits to encode), and as you go further away from 0, these results
are less likely, and thus take more bits to encode.
A more complex scheme: You can take advantage of the assumption that
the temperatures will never fall outside the [-50,50] range. If your sine
wave predicts a certain value to be at 49.8, you need not worry about
encoding a deviation 0.3, because that would push you outside the allowed
range. How exactly to assign your bits to the remaining, I'll leave as an
exercise to the reader.
There are newsgroups devoted to compression topics where you
might find more information. It's been years since I followed
them, but I recall their FAQs as being highly informative.
Yes I have already been directed there, but wavelet, furrier etc. is too
complicated for me. All I really wanted was to hear whether the Java
community knew of alternative compression mechanisms readily at hand.
Ah, in that case, comp.compression isn't the right place. I was under
the impression you wanted to design your own custom compression scheme.
There's no reason to suspect the regulars there are familiar with Java,
and they'd probably be more interested in the puzzle-like fun of coming up
with new compressions schemes than the boring task finding existing
libraries to suit a particular task.
- Oliver