Re: StreamTokenizer, data records, indexing/ newline trouble
Jeff Higgins wrote:
Hi,
Solution: modified found code:
Not a solution.
One case left to solve:
"1","Title 1","CAN",
"2","Title 2","USA","Title 2 description contains no newlines"
"3","Title 3","MEX","Title 3 description contains no newlines"
the case of the 4th field == null.
produces output:
1 Title 1 CAN null
Title 2 USA Title 2 description contains no newlines null
Title 3 MEX Title 3 description contains no newlines null
import java.io.*;
import java.util.*;
import java.util.regex.*;
public class RecordScanner {
public static final String CSV_PATTERN = "\"([^\"]+?)\",?|([^,]+),?|,";
private static Pattern csvRE = Pattern.compile(CSV_PATTERN);;
private static ArrayList<Record> list = new ArrayList<Record>();
public static void main(String[] args) {
if (args.length == 0) {
System.err.println("missing input filename");
System.exit(1);
}
try {
PushbackReader pr = new PushbackReader(new FileReader(args[0]), 200);
Scanner sc = new Scanner(pr);
sc.useDelimiter(csvRE);
while (sc.hasNext()) {
Record dummy = new Record();
for (int i = 0; i < 4; i++) {
String match = sc.findWithinHorizon(csvRE, 0);
if (match.endsWith(",")) {
// This statement doesn't work.
if (match.startsWith("\r\n") && i == 3){
pr.unread(match.toCharArray());
match = "null";
}
else{
match = match.substring(0, match.length() - 1);
}
}
if (match.startsWith("\"")) { // assume also ends with
match = match.substring(1, match.length() - 1);
}
if (match.length() == 0){
match = null;
}
if(i == 0){
dummy.code = match;
}
else if(i == 1){
dummy.title = match;
}
else if(i == 2){
dummy.country = match;
}
else{
dummy.description = match;
}
}
list.add(dummy);
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
for (Record r : list){
System.out.println(r.code + " " + r.title +
" " + r.country + " " + r.description);
}
}
static class Record{
String code;
String title;
String country;
String description;
}
}
/*
* Copyright (c) Ian F. Darwin, http://www.darwinsys.com/, 1996-2002.
* All rights reserved. Software written by Ian F. Darwin and others.
* $Id: LICENSE,v 1.8 2004/02/09 03:33:38 ian Exp $
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS''
* AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
* TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
* PARTICULAR
* PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS
* BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
* CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
* SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
* INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
* CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
* ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
* THE
* POSSIBILITY OF SUCH DAMAGE.
*
* Java, the Duke mascot, and all variants of Sun's Java "steaming coffee
* cup" logo are trademarks of Sun Microsystems. Sun's, and James Gosling's,
* pioneering role in inventing and promulgating (and standardizing) the
* Java
* language and environment is gratefully acknowledged.
*
* The pioneering role of Dennis Ritchie and Bjarne Stroustrup, of AT&T, for
* inventing predecessor languages C and C++ is also gratefully
* acknowledged.
*/
/*
* MODIFIED 1 April 2007 Jeff Higgins, oohiggins@yahoo.com
*/