Re: Loading a simple XHTML transitional document into a org.w3c.dom.Document

Tom Anderson <>
Sat, 11 Jul 2009 23:17:58 +0100
On Sat, 11 Jul 2009, Arne Vajh?j wrote:

Tom Anderson wrote:

On Fri, 10 Jul 2009, Arne Vajh?j wrote:

Tom Anderson wrote:

It's worth noting that HTML 5 will not be SGML:


HTML 5 parsers will be from scratch then.

No, since no current browser parses HTML using an SGML parser. They're all
handwritten anyway. AIUI, the only SGML-based HTML parsers in production
are the online validators!

                       and XHTML is XML, and despite what some have
claimed, XML is not a subset of SGML.

some ?

You mean like in the first few lines of the XML specification ?


The Extensible Markup Language (XML) is a subset of SGML that is
completely described in this document. Its goal is to

A very good example. Despite being in the spec, this is a lie.

The XML specification lying about what XML is ????


Unless <foo/> can be a legal way of writing an empty foo element
(including when foo is declared with a content model other than EMPTY) in
SGML, which i don't believe it can.

I think SGML also doesn't allow colons in names, which XML does. BICBW.

There is a thing called Web SGML, which is a slightly modified version of
SGML which i think *is* a superset of XML. But basically, that was
invented so that XML could be retrofitted into the SGML framework; it's
not 'proper' SGML.

I find this stuff hard to get my head round because SGML is that it's far
more customisable than XML - as well as the DTD, there's an 'SGML
declaration', which can do things like define what character is used to
mark the start of tags (hardwired to < in XML) and so on. This is very
powerful, but ludicrously complex. It can in fact be used to alter SGML to
the point that it gets very close to XML - and Web SGML enables it to go
the remainder of the distance.


For me, thats just logic. OTOH, Spock went bananas several times using
logic. -- Pete, mfw

