On 9/7/2010 10:27 AM, Spud wrote:
On strings: when you're doing (very) large scale data processing,
strings are a no-no. The only answer is to have some kind of reusable
buffer for storing text.
Define very large scale? Also, "strings" are not the same as "Strings",
so lets make sure you are specifically talking about "using String
instances is a no-no".
Here's a discussion of the topic:
http://lingpipe-blog.com/2010/06/22/the-unbearable-heaviness-jav-strings/
All this says is that there is a 60 byte overhead per String instance.
You're original objection was that it had something to do with bogging
down the GC.
While 60 byte overhead seems to be a lot, keep in mind that any object
holding string-like data will have the same overhead. Reusable objects
(char buffers) may interfere with the GC more than short lived objects,
depending on the GC implementation. I've heard this is the case for most
modern JVM GCs, which is why Object pools are not common in modern Java
programs.
It sounds to me like you read in a few places about potential
inefficiencies with Java Strings, and have decided to avoid them before
finding actual, practical problems.
No. It's based on actual experience, the same experience the blogger had.