Re: Junit - "Credible" HTML checker?
On Thu, 6 Aug 2009, bugbear wrote:
I have some routines that generate HTML;
it would be useful if (in my unit testing)
I had a quick and dirty "is this valid HTML" test.
I don't need an html renderer - something
cruddy based on "likely" looking regexps would
suit me very well.
I'm simply trying to avoid doing full deploy + interactive
testing of stuff (html) which isn't even "likely".
Does anyone know of anything?
The Rolls-Royce here is HtmlUnit, which is a complete headless browser -
it reads HTML, parses CSS, runs javascript (courtesy of Rhino), etc. It
has interfaces which make it easy to ask questions like "get me all the
div elements", "get me all the paragraph elements with class errorReport",
"get me the text content of this element", etc, which is what you need for
testing.
It's built on top of NekoHTML, which is a pretty decent HTML parser. Other
popular parsers are JTidy and TagSoup, but i think those are more lenient
in their parsing (Neko can be lenient, but tends more towards strictness),
and for what you want to do, you don't want leniency.
Apologies for the lack of URLs, but you strike me as the kind of chap who
is quite capable of using google!
tom
--
The sunlights differ, but there is only one darkness. -- Ursula K. LeGuin,
'The Dispossessed'