W3C announced today the end of the XHTML 2 working group charter. It has long been apparent that HTML 5 would become a reality, much to my chagrin, but I had hoped deep down that XHTML2 would not die. You see, HTML 5 is a great tactical move to progress HTML making multimedia easier (among other things) for content authors. However, strategically, I feel this sets back the Web, particularly the semantic Web. I am disappointed, not because XHTML 2 has better features than HTML 5, but because the promotion of HTML 5 over XHTML 2 moves the Web in the wrong direction.
What’s so great about XHTML2
What’s so great about XHTML2? There are some nice cleanups, like replacing
<h1-6> with a simple
<hr> is replaced with a more semantically-named
<separator>. The generalization of the
src attributes were also pretty exciting. However, the biggest thing is decentralized extensibility. In fact, the W3C mentioned this in their FAQ regarding the end of XHTML2. The very existence of HTML 5 exemplifies why this is a big deal. Decentralized extensibility means we can create our own vocabularies independent of browser implementations.
The current state of things
As a developer, you’re stuck with the elements a user-agent implements. With HTML, you are limited to what a consortium managed to agree upon and browser authors implemented (somewhat) consistently. The needs of content authors and the capabilities browsers have outgrown the existing centralized vocabularies of HTML 4 and XHTML 1. HTML 5 addresses this problem with a brute force method: keep improving the centralized vocabulary, and try to keep up with the developer community.
Where we could be headed
If we currently live in the era of the Web browser, we ought to be headed toward the era of the XML browser.With XML, we have more liberty. We can create XML documents with our own vocabularies. We can even define the vocabularies through DTDs or other schemata. We can mix and match our vocabularies with others. We can define the layout and presentation of new languages with CSS and XSL-FO. Better still, we can define what our new elements and attributes mean through RDF and its related technologies. This is the key.
Every time I mention this, I get the same question, “How would a browser know that my attribute
mynamespace:foo@fetcherator is the same as
html:a@href?” This is where RDF comes in. With semantic technologies, we can define our vocabularies in such a way that a browser will know.
Why can’t we head there with HTML 5?
We can. In fact, I believe we will (though perhaps more slowly now). My concern is the motivation behind the HTML enhancements in HTML 5 vs XHTML 2. XHTML 2 was a continuation of the W3C’s efforts to bring the benefits of XML into HTML, while HTML 5 seems to bring the benefits of new user agent capabilities into HTML. We need to decouple the user agent capabilities from our document vocabularies. This is true independence.
We can still do this. The future may still be bright. I hope that we do not lose the vision as we postpone liberty for the comforts afforded by the latest iteration of a centralized HTML.