Re: How to check variables for uniqueness ?

From:

Lew <lew@nowhere.com>

Newsgroups:

comp.lang.java.programmer

Date:

Sat, 30 Dec 2006 13:10:57 -0500

Message-ID:

<4OCdna_AtKwvNgvYnZ2dnUVZ_syunZ2d@comcast.com>

Ed wrote:

Hemal Pandya skrev:

Ed Kirwan wrote:

Patricia Shanahan wrote:

[...]

Perhaps using a List would obviate the need for the nest loop?

It will, but will be a lot more expensive.
[....]

Thanks for that tip, Hemal. I had no idea that Set-implementations were
so much more efficient (in this case) than List-implementations. The
output from the (no-doubt indent-mashed) code below gives:

522393 duplicated words. Using java.util.HashSet, time = 678ms.
522393 duplicated words. Using java.util.TreeSet, time = 1812ms.
522393 duplicated words. Using java.util.ArrayList, time = 157724ms.
522393 duplicated words. Using java.util.LinkedList, time = 251739ms.

import java.util.*;
import java.io.*;

class Test {
    private static String TEXT_BOOK_NAME = "war-and-peace.txt";

    public static void main(String[] args) {
    try {
        String text = readText(); // Read text into RAM
        countDuplicateWords(text, new HashSet());
        countDuplicateWords(text, new TreeSet());
        countDuplicateWords(text, new ArrayList());
        countDuplicateWords(text, new LinkedList());
    } catch (Throwable t) {
        System.out.println(t.toString());
    }
    }

    private static String readText() throws Throwable {
    BufferedReader reader =
        new BufferedReader(new FileReader(TEXT_BOOK_NAME));
    String line = null;
    StringBuffer text = new StringBuffer();
    while ((line = reader.readLine()) != null) {
        text.append(line + " ");
    }
    return text.toString();
    }

    private static void countDuplicateWords(String text,
                        Collection listOfWords) {
    int numDuplicatedWords = 0;
    long startTime = System.currentTimeMillis();
    for (StringTokenizer i = new StringTokenizer(text);
         i.hasMoreElements();) {
        String word = i.nextToken();
        if (listOfWords.contains(word)) {
        numDuplicatedWords++;
        } else {
        listOfWords.add(word);
        }
    }
    long endTime = System.currentTimeMillis();
    System.out.println(numDuplicatedWords + " duplicated words. " +
               "Using " + listOfWords.getClass().getName() +
               ", time = " + (endTime - startTime) + "ms.");
    }
}

(Please do not embed TAB characters in newsgroup postings.)

You could use a HashMap if you wanted to know how many times each word occurred:

Map< String, Integer > concordance = new HashMap< String, Integer > ();
for ( StringTokenizer tok = new StringTokenizer(text);
       tok.hasMoreElements(); )
{
   String word = tok.nextToken();
   Integer kt = concordance.get( word );
   if ( kt == null )
   {
     concordance.put( word, Integer.valueOf( 0 ));
   }
   else
   {
     concordance.put( word, Integer.valueOf( kt.intValue() + 1 ));
   }
}

then get total dupes by analyzing the concordance:

int totalDupes = 0;
for ( Map.Entry< String, Integer > entry : concordance.entrySet() )
{
   if ( entry.getValue().intValue() > 1 )
   {
     ++totalDupes;
   }
}

- Lew