Re: HashSet keeps all nonidentical equal objects in memory
On 7/20/2011 5:43 AM, Frederik wrote:
Hi,
I've been doing java programming for over 10 years, but now I've
encoutered a phenomenon that I wasn't aware of at all.
I had an application in which I have a HashSet<String>. I added a lot
of different String objects to this HashSet, but many of the String
objects are equal to each other. Now, after a while my application ran
out of memory, even with -Xmx1500M. This happened when there were only
about 7000 different Strings in the set! I didn't understand this,
until I started adding the "intern()" of every String object to the
set instead of the original String object. Now the program needs
virtually no memory anymore.
There is only one explanation: before I used "intern()", ALL the
different String objects, even the ones that are equal, were kept in
memory by the HashSet! No matter how strange it sounds. I was
wondering, does anybody have an explanation as to why this is the case?
I'm unable to reproduce your problem (see test program below).
Perhaps you've overlooked another possible explanation: Before you
switched to using intern(), maybe you were retaining your own
references to all those Strings accidentally.
Here's my test program: It inserts twenty thousand distinct but
identical Strings into a HashSet, pausing every now and then to
report how much memory is used (with some heavy-handed attempts to
force garbage collection):
package esosman.misc;
import java.util.HashSet;
public class HashSpace {
public static void main(String[] unused) {
HashSet<String> set = new HashSet<String>();
String value = "x";
for (int n = 0; n < 20; ++n) {
report(n * 1000);
for (int i = 0; i < 1000; ++i) {
value = (value + "x").substring(1);
set.add(value);
}
}
report(20 * 1000);
}
private static void report(int insertions) {
long memUsed = runtime.totalMemory() - runtime.freeMemory();
long memPrev = Long.MAX_VALUE;
for (int gc = 0; (memUsed < memPrev) && gc < 5; ++gc) {
runtime.runFinalization();
runtime.gc();
Thread.yield();
memPrev = memUsed;
memUsed = runtime.totalMemory() - runtime.freeMemory();
}
System.out.printf("After %d insertions, memory used = %d\n",
insertions, memUsed);
}
private static final Runtime runtime = Runtime.getRuntime();
}
.... and here's what I get for output:
After 0 insertions, memory used = 125656
After 1000 insertions, memory used = 133272
After 2000 insertions, memory used = 133664
After 3000 insertions, memory used = 133272
After 4000 insertions, memory used = 133312
After 5000 insertions, memory used = 133272
After 6000 insertions, memory used = 133312
After 7000 insertions, memory used = 133272
After 8000 insertions, memory used = 133312
After 9000 insertions, memory used = 133272
After 10000 insertions, memory used = 133312
After 11000 insertions, memory used = 133272
After 12000 insertions, memory used = 133312
After 13000 insertions, memory used = 133448
After 14000 insertions, memory used = 133840
After 15000 insertions, memory used = 133448
After 16000 insertions, memory used = 133488
After 17000 insertions, memory used = 133272
After 18000 insertions, memory used = 133312
After 19000 insertions, memory used = 133272
After 20000 insertions, memory used = 133312
I see no evidence that all those String instances are being
retained anywhere: They need ~24 bytes apiece, which would come
to about half a megabyte.
--
Eric Sosman
esosman@ieee-dot-org.invalid