Re: Keeping the split token in a Java regular expression
On 03/27/2012 01:26 AM, Lew wrote:
Stefan Ram wrote:
laredotornado writes:
What I would like to do is split the expression wherever I have an
public class Main
{
public static void split
( final java.lang.String text )
{ java.util.regex.Pattern pattern =
java.util.regex.Pattern.compile
( ".*?(?:am|pm),?", java.util.regex.Pattern.CASE_INSENSITIVE );
java.util.regex.Matcher matcher = pattern.matcher( text );
while( matcher.find() )
java.lang.System.out.println( matcher.group( 0 )); }
public static void main( final java.lang.String[] args )
{ split( "Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM" ); }}
This excellent (except for layout) example deserves to be archived.
What do you find excellent about this? I find it has some deficiencies
- the separator is included in the match (which goes against the
requirements despite the thread subject)
- spaces after a separator comma are included in the next token as
leading text
- the method really does more than splitting (namely printing), so the
name does not reflect what's going on
- the Pattern is compiled on _every_ invocation of the method
- the method is unnecessary restricted, argument type CharSequence is
sufficient
Test output for
"Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM"
"Fri 8 PM, Sat 1, 3, and 5 PM"
Fri 7:30 PM,
Sat 2 PM,
Sun 2:30 PM
---
Fri 8 PM,
Sat 1, 3, and 5 PM
---
I would change that to
import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Main {
private static final Pattern SPLIT_PATTERN = Pattern.compile(
"(\\S.*?(?:am|pm))(?:,\\s*)?", Pattern.CASE_INSENSITIVE);
public static void splitPrint(final CharSequence text) {
for (final Matcher m = SPLIT_PATTERN.matcher(text); m.find();) {
System.out.println(m.group(1));
}
}
public static List<String> split(final CharSequence text) {
final List<String> result = new ArrayList<String>();
for (final Matcher m = SPLIT_PATTERN.matcher(text); m.find();) {
result.add(m.group(1));
}
return result;
}
public static void main(final java.lang.String[] args) {
splitPrint("Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM");
System.out.println("---");
splitPrint("Fri 8 PM, Sat 1, 3, and 5 PM");
System.out.println("---");
}
}
I might even sneak a "\\s*" in between "pm)" and "(?:," to even catch
cases where there are spaces before the separator.
Kind regards
robert