Re: counting instances of a string.
Patricia Shanahan wrote:
Roedy Green wrote:
On Tue, 08 Jul 2008 08:48:00 -0400, Eric Sosman
<esosman@ieee-dot-org.invalid> wrote, quoted or indirectly quoted
someone who said :
int countInstances(String page, String lookFor) {
int count = 0;
for (int start = 0;
(start = page.indexOf(lookFor, start)) >= 0;
start += lookFor.length())
++count;
return count;
}
That is certainly tidier. I will make that the official StringTools
version.
It is quite nice. You can also do it with a single test in a while loop:
int count1(String page, String lookFor) {
int start = 0;
int count = 0;
while ((start = page.indexOf(lookFor, start)) >= 0) {
count++;
start += lookFor.length();
}
return count;
}
This is more readable than my original, largely because it
avoids cramming all three pieces into a single over-cluttered
`for' statement. (The only reason I did so was to restrict the
scope of the `start' variable very slightly and emphasize that
it was not an output but a piece of the loop's machinery.)
I prefer to avoid the side effect not at the top level of its
expression.
De gustibus non disputandum est, I guess. Do you also
avoid `array[i++] = 42'? It's certainly possible to over-use
side effects, but I'm unwilling to give up quite so much.
Also, the start testing is answering two questions: "Should
I count another copy?" and "Have I finished?", and that is better
expressed by doing it twice:
int count1(String page, String lookFor) {
int start = 0;
int count = 0;
while(start >= 0){
start = page.indexOf(lookFor,start);
if(start >= 0){
count++;
start += lookFor.length();
}
}
return count;
}
To my mind, the two questions are the same question. When
I'm counting the black jellybeans in a multicolored pile, I remove
them one at a time saying "One black jellybean, two black jellybeans"
and so on until the pile holds no more black jellybeans. The "stop"
action and the "count and remove" action are just the two outcomes of
the single question "Any more black jellybeans?"
It's interesting (and might be fruitful) to explore how different
people visualize the same algorithm. Long ago I read an interesting
book titled "The Psychology of Computer Programming" by Weinberg, but
I no longer recall whether he considered this matter. (I *do* recall
his report of an experiment that supported the contention "Comments
are bad" ...)
I also rather like the regular expression idea, provided lookFor gets
quoted.
It's not the same problem Roedy stated, but the same outline
solves it: just use a matcher instead of indexOf() and let the matcher
keep track of its own starting point.
Of course, if one wants to search for the same lookFor string
repeatedly in a large block of text (or in many blocks), it may be
worth while to invest the preprocessing time needed for a fancier
string-search algorithm like Boyer-Moore or one of its variants.
The preprocessing is sort of analogous to compiling a regex: You
spend a chunk of time before the first iteration in hopes that it
will be repaid by the faster search.
(And I'm *definitely* not going to try to type a Boyer-Moore
implementation free-hand into a mail window!)
--
Eric.Sosman@sun.com