Special chars in regular expressions - Problems
I am trying to match (for removal) certain characters from a file,
using regular expressions (java.util.regex.*). However, the Java 1.6
compiler does not like any of the characters that are preceded by the
backslash ("/"), although the Java SE 6.0 API documentation for
java.util.regex
Class Pattern says it`s OK. The code may be found below.
For the line of code
String patternStr = "[\d\p{Space}\p{Punct}\p{Blank}]+";
it says that each of the characters after the slashes (d and p) is an
"illegal escape character." I understood the brackets [...] to mean
match any of the characters enclosed by them.
Is it obvious to anyone what I am doing wrong? Thanks, Alan
import java.net.*;
import java.io.*;
import java.util.regex.*;
public class CleanTokens
{
public static void main ( String[] args ) throws IOException
{
try
{
BufferedReader infile = new BufferedReader(new
FileReader("input.txt"));
PrintWriter outfile = new PrintWriter(new
FileOutputStream("output.txt"));
String inputStr = "";
// Define regular expression pattern
String patternStr = "[\d\p{Space}\p{Punct}\p{Blank}]+";
// Compile regular expression
Pattern pattern = Pattern.compile(patternStr);
// Define matcher object
Matcher matcher = pattern.matcher(str);
// Define replacement string (nothing --- will delete pattern)
String replacementStr = "";
while ((inputStr = infile.readLine()) != null)
{
// Take the current string
matcher.reset(inputStr);
// Remove the characters that match the pattern
inputStr = matcher.replaceAll(replacementStr);
// Write out the modified string
outfile.println(str);
}
infile.close();
outfile.close();
}
catch (IOException e) {e.printStackTrace();}
}
}