Re: Regex challenge
On 2008-06-04 18:09 +0100, Roedy Green allegedly wrote:
On Wed, 04 Jun 2008 18:03:08 +0200, Daniele Futtorovic
<da.futt.news@laposte.invalid> wrote, quoted or indirectly quoted
someone who said :
Have you tried the java.util.regex.Pattern.DOTALL and similar flags?
In my example, I use (?m) which is supposed to turn on the multiline
feature to treat $ as end of line rather than end of string.
I suppose I could experiment with turning it on explicitly.
There's one problem you haven't addressed, AFAICS. You have your
"Bunkie" and "Hessmer", each followed by a row of numbers/percentages.
The trouble is the 1->N relationship. It's not clear to me whether you
want to have all these percentages or but one of them. If you want to
have only one you can get it to work. If you want to have them all AND
their number is always the same, then you can get it to work. But unless
I'm mistaken, if you want them all AND their number is NOT always the
same, you won't get it to work with one regex only. For, again: unless
I'm mistaken, undetermined capturing within quantified expressions
doesn't work. That was a terrible formulation of the problem, but I hope
you see what I mean.
Here's an example that works with a fixed number of percentages (three,
as in you input). I've modified the input a bit, giving it a proper HTML
structure.
<sscce>
package scratch;
import java.util.*;
import java.util.regex.*;
public class Scratch {
public static void main(String[] ss) {
String term = System.getProperty("line.separator");
String input = "<tr><td><p class=MsoNormal align=center
style='text-align:center'>Bunkie</p>" + term +
"</td>" + term +
"<td style='padding:.75pt .75pt .75pt .75pt'>" + term +
"<p class=MsoNormal align=center style='text-align:center'>9
%</p>" + term +
" </td>" + term +
" <td style='padding:.75pt .75pt .75pt .75pt'>" + term +
" <p class=MsoNormal align=center
style='text-align:center'>4%</p>" + term +
" </td>" + term +
" <td style='padding:.75pt .75pt .75pt .75pt'>" + term +
" <p class=MsoNormal align=center
style='text-align:center'>5%</p>" + term +
" </td>" + term +
" </tr>" + term +
" <tr style='mso-yfti-irow:2'>" + term +
" <td style='padding:.75pt .75pt .75pt .75pt'>" + term +
" <p class=MsoNormal align=center" + term +
"style='text-align:center'>Hessmer</p>" + term +
" </td>" + term +
" <td style='padding:.75pt .75pt .75pt .75pt'>" + term +
" <p class=MsoNormal align=center
style='text-align:center'>8%</p>" + term +
" </td>" + term +
" <td style='padding:.75pt .75pt .75pt .75pt'>" + term +
" <p class=MsoNormal align=center
style='text-align:center'>4%</p>" + term +
" </td>" + term +
" <td style='padding:.75pt .75pt .75pt .75pt'>" + term +
" <p class=MsoNormal align=center
style='text-align:center'>4%</p>" + term +
" </td></tr>";
Pattern p = Pattern.compile("(?s)<tr(?:" +
".*?<td.*?<p .*?>(.*?)</p>.*?</td>" +
".*?<td.*?<p .*?>(.*?)</p>.*?</td>" +
".*?<td.*?<p .*?>(.*?)</p>.*?</td>" +
".*?<td.*?<p .*?>(.*?)</p>.*?</td>" +
").*?</tr>");
for( Matcher m = p.matcher(input); m.find(); ){
System.out.print(m.group(1).toString() + ": ");
for(int ii = 2; ii <= m.groupCount(); ii++){
System.out.print(m.group(ii));
System.out.print( (ii < m.groupCount()) ? ", " : term);
}
}
}
}
</sscce>
--
DF.
to reply privately, change the top-level domain
in the FROM address from "invalid" to "net"