Re: Sorting numeric strings
On 4/30/12 6:27 PM, Ben wrote:
Given the following data:
Col1, Col2, Col3
438.23, 991897664, ccc
22.12, 991897631, bbb
100.99, 881897631, aaa
50.12, 991884803, ddd
The class below will sort the data based on the column specified, except
Col1, which contains float values. If you set the SortCol variable below
to 0, sorting does not work. If you set it to 1 or 2, sorting does work.
How can I sort Col1 which is a column of numeric strings?
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.FileWriter;
import java.util.LinkedList;
import java.util.List;
import java.util.Map;
import java.util.TreeMap;
public class SortColumn {
public static void main(String[] args) throws Exception {
BufferedReader reader = new BufferedReader(new FileReader("file.csv"));
//BufferedReader reader = new BufferedReader(new
FileReader("jtp-input2-test.csv"));
Map<String, List<String>> map = new TreeMap<String, List<String>>();
String line = reader.readLine(); //read header
while ((line = reader.readLine()) != null) {
String key = getField(line).toString();
List<String> l = map.get(key);
if (l == null) {
l = new LinkedList<String>();
map.put(key, l);
System.out.println(key);
}
l.add(line);
}
reader.close();
FileWriter writer = new FileWriter("sorted_numbers.txt");
writer.write("Col1, Col2, Col3\n");
// writer.write("billnumber, Copay, Discount, NonAllow, unknown\n");
for (List<String> list : map.values()) {
for (String val : list) {
writer.write(val);
writer.write("\n");
}
}
writer.close();
}
private static String getField(String line) {
// Column you want to sort on (Zero based)
int SortCol = 0;
return line.split(",")[SortCol];
}
}
In order to compare two strings as numbers, you need to pad zeros on
both extremes away from any "dot".
In other words, in order to compare "123" with "3.141", you'd need to
"normalize" them to "123.000" and "003.141".
I've actually recently written something that does this, and handles
arbitrary "." designations. This was actually designed to work with
revision numbering, which can have multiple ".".
import org.apache.commons.lang.StringUtils;
import java.util.Comparator;
public class StringAsNumberComparator implements Comparator<String> {
private int compare(String left, String right) {
final String[] a = StringUtils.split(left, '.');
final String[] b = StringUtils.split(right, '.');
for (int i = 0; i < a.length; ++i) {
if (i >= b.length) {
return 1;
}
final int compare = compareMaybeNumeric(left, right);
if (compare != 0) {
return compare;
}
}
return a.length - b.length;
}
private static int compareMaybeNumeric(String a, String b) {
if (StringUtils.isNumeric(a) && StringUtils.isNumeric(b)) {
final int length = Math.max(a.length(), b.length());
return StringUtils.leftPad(a, length,
'0').compareTo(StringUtils.leftPad(b, length, '0'));
} else {
return a.compareTo(b);
}
}
}
Although, now that I'm looking at this, I see a few optimizations I can
make that don't involve padding. If two numbers aren't the same length,
then the longer string is larger magnitude.
Of course, this code doesn't consider negative values, but can be
adjusted to do so.