Re: Help with regular expression
On 2006-08-21 12:25:58 +1000, "hiwa" <HGA03630@nifty.ne.jp> said:
Grost ??????????????????:
Hi all,
I'm writing an application to perform some HTML text manipulation from
templates and I have a regex formulation problem. For example, in the
template I have a line:
<tr><td class="caption"><!--@caption--><!--<br />(@caption)--></td></t
r>
where the parts I want to replace are HTML comments: <!-- ??? -->
There are two styles of comment I want to search/replace:
1) <!--@caption-->
2) <!--XXX(@caption)YYY-->, where XXX and YYY can represent other HTML
Case 1 is easy, and I just use: <!--\s*?@caption\s*?-->
Case 2 is the problem. I trying to use this for conditional insertion
of additional HTML, depending on whether @caption exists in the
application. If I have a value for @caption, then the following is
produced from the above example:
<tr><td class="caption">foo<br />foo</td></tr>
This seems easy enough in principle, but every regex pattern I've tried
unsuprisingly matches the <!-- from the first comment. My initial try
which of course failed was: <!--(.*?)\(@caption\)(.*?)-->
What I need is a way of saying:
Match "(@caption)" within an HTML comment, and capture the text on
either side of tag and within the comment, but make sure there are no
other comment-like tags within that text. I'm guessing I need something
along the lines of the lookaround operators, but I have little
experience with them. Any help anyone...?
(For clarity I removed the extra escaping required for Java inline string
s.)
Stan
I think your description does not formalize the requirement well
enough.
Here's a rough stab in the dark. HTH.
------------------------------------------------------------------
public class Grost{
public static void main(String[] args){
String text "<tr><td class=\"caption\"><!--@caption--><!--<br
/>(@caption)--></td></tr>";
String result = "<tr><td class=\"caption\">foo<br />foo</td></tr>";
String regex1 = "<!--(<[^>]+>).*-->";
String regex2 = "<!--.*-->";
text = text.replaceAll(regex1, "foo$1foo");
text = text.replaceAll(regex2, "");
if (result.equals(text)){
System.out.println("success");
}
}
}
I figured that formalisation may be a problem, and that's quite likely
to be the aspect for which I need the most help. Essentially I want to
allow arbitrary text (inc.HTML) either side of a caption tag:
<!--XXX(@caption)YYY-->
with the only restriction being that the text CANNOT be an HTML comment.
XXX cannot contain <!--.*-->
YYY cannot contain <!--.*-->
In regex terms, if I use my non-working version:
<!--(.*?)\(@caption\)(.*?)-->
then neither $1 or $2 capuring groups in this match should contain any
HTML comments.
Any clearer?
Stan