The Web’s Dirty Little Secret: Why websites should use HTML 4.01 and not XHTML.

Aractus 21, June, 2010

Ask any Joe-Doe what a website is, and he’ll say “why it’s a resource filled with H.T.M.L. thingies those Hyper-Text Markup Language text-documents, and H.T.M.L. is what makes it look like more then just plain text, and they’re sitting on a server and the server sends those H.T.M.L. thingies to Internet Explorer what then displays the website on your computer”. Full marks to average Joe.

Ask any Joe-Webmaster what XHTML is and he’ll say “why it’s a more strict type of H.T.M.L. that is better”. But that isn’t right. XHTML isn’t HTML at all it’s XML, W3C says so. Specifically W3C refers to it as a reformulation of HTML in XML. So why does every Tom Dick and Harry Webmaster think it’s a true HTML format?

Well firstly, as the Webmaster for scummgames.net, and a true-XHTML coder, I want to fully disprove the myth that XHTML is in fact HTML. By the way, it’s only partially true that I code in XHTML – I prefer HTML and specifically I write PHP code because it simplifies work and works well with my pure CSS designing, which I adopted well before it became fashionable, and I still use the same program I’ve always used to code all web-based documents: Notepad. Not to mention the other benefits like gzip. So when I say I write XHTML what I mean is I write PHP code and inside PHP I write HTML or XHTML. Still when I write XHTML I write true XHTML not “valid” XHTML; I write valid XML! Hence why I have at the page footer a link to an XML validation service before the XHTML validator (I don’t touch the wannabe-validator “Validome”, but if you do then I’m sure it’ll also work flawlessly).

Most of us serious webmasters are familiar with what a MIME Type is – it’s the header that the server sends to identify what type of file it is sending. This is important because the internet doesn’t use file-extensions to convey this information like is common on your home computer. The extension of an HTML file in a web address can be anything from .htm, .html, .shtml, .php, .asp, . (add more) – and in fact any other conceivable file extension. Your web browser understands that the document is HTML when it is delivered using the text/html MIME Type.

XHTML 1.0 files can be sent using the text/html MIME Type, but this means they are delivered, read and parsed as HTML documents instead. So clearly there’s no benefit to sending an XHTML document as text/html when you can instead send a correctly formatted HTML document.

This blog is fully accessible using an XML parser. I’ve modified the original “theme” to my liking so that it sends as XML by default, and as HTML if the client doesn’t support XML (that’s XML using the XHTML doctype, or HTML with the XHTML doctype).

As can be plainly seen on my blog, it is XHTML. That’s because the theme I installed was quasi-xhtml. I hate quasi-xhtml, with a passion. Currently, the vast majority of webmasters who think they’re sending XHTML 1.0 files are sending HTML files with an XHTML doctype in it. Every XHTML website in the world sent using the text/html MIME Type is already read as HTML by non-XHTML compliant devices.

Let me state this bluntly: XHTML is XML it is not HTML, it is not supposed to be HTML it is not “100% backwards compatible” with HTML. If you believe it is then open a self-closing script tag in your document and see what happens. Put this in your XHTML document:

<script src=”whatever.js” type=”text/javascript” />

And see what happens. It’s valid XHTML 1.0, it’s in mine, as long as this blog is actually being sent as XML it’s in mine (otherwise it’s modified to please HTML parsers that don’t understand XHTML). This is the problem with XHTML and why it’s such a failure, webmasters understand HTML not XML. They don’t understand how to create valid XHTML, they think just because the W3C Validator ticks it that it’s XHTML when more then likely no XML parser in the world is able to receive it as XHTML.

If you actually think of HTML as you code XHTML you clearly have missed the point because you should be thinking in XML terms. And all tags must close in XML, you can self close any tag, or you can use a separate closing tag, but it is more correct for “empty tags” to self-close, hence the above example.

By the way, the argument that it is akin to writing well-formed HTML is irrelevant, not to mention incorrect since the short-tag isn’t supported in XHTML.

If this sounds a little dramatic it’s because it’s true. Why do you think no one uses XHTML 1.1 or XHTML 2.0? 1.0 was released in 2000, 1.1 was released in 2001! 10 years of webmasters misusing XHTML.

Anyway, all XML valid documents MUST contain the XML declaration line at the very top of the page:

<?xml version=’1.0′ encoding=’utf-8′?>

Without it, if the encoding is not utf-8 or utf-16 XML parsers will not even try and render it. It is as sinful as leaving out the XHTML declaration line. But as soon as the line is in your document it can no longer be read as HTML. Which is why my blog removes it when it sends the document to an HTML-only browser.

Again, unless you have the brain of a pea, unless you’re actually creating XML documents, stick to HTML – the format that was actually designed for web-browsers.

The great thing about HTML is that it is a purpose built piece of code. XHTML on the other hand is designed as an XML conforming code. That means that with XHTML the days of errors in your code are over because if it isn’t 100% valid code then it’ll probably cause problems for client devices. This is of course an unrealistic requirement for 99% of hobby-webmasters with websites on the likes of yahoo and geocities.

The use of XHTML is futile if it is not used correctly for this purpose. If your website is only going to be accessible from HTML-compliant devices then using XHTML is entirely useless as those devices are primarily designed for HTML.

In closing, I want to say that it took W3C over 10 years to update the HTML specification to 5.0 and to finally add the much needed functionality missing from 4.01. And it’s still in draft stage. The great thing about HTML is that it is a purpose built piece of code. XHTML on the other hand is designed as an XML conforming code. Far more restrictive, and forces all errors to be resolved by the coder, not the browser. When looking at the use of XHTML in websites, I find that the overwhelming majority are using it exclusively as HTML. This is basically equal to using the HTML Doctype tag and having errors throughout their documents, since that is how it is seen by browsers! I hope that as HTML 5.0 begins to gain popularity as it is implemented by newer browsers for its new features that webmasters begin to drop XHTML unless they’re actually using it correctly.

4 comments on “The Web’s Dirty Little Secret: Why websites should use HTML 4.01 and not XHTML.”

  • Hingie says:

    Aaaah excellent Daniel i knew you’d eventually see it my way! Html 5 is indeed the way of the future, it kicks ass. Why? because your old nemesis Apple says so. They’re not supporting flash anymore, they hate Adobe now. http://www.apple.com/hotnews/thoughts-on-flash/ . Steve Jobs is obviously a hypocrite, saying that open source is good enough for web but proprietary is what they’re sticking with for Apple software.

    Html 5 is going to go apeshit because of Apple’s blessing, and no other reason. I’m glad you’ve seen sense and like Apple now better than the other than inferior product, it’s good for you to finally grow up.

    Hehehehe! love the blog it’s lots of fun i’ll be back often to see what new little tirades youre going on about!! :)

    • Aractus says:

      HTML 5 is going to implement long overdue features. One that I’ve been on about for years is the fact you can have a font-list for fonts but you can’t have an image-list for images. Imagine being able to specify several different formats (that is different URLS) for an image, you wouldn’t have to worry about compatibility because last on your list would be GIF or JPEG (or possibly first if it’s backwards-compatible with HTML 3). Finally new formats could actually compete.

      Apple has nothing to do with HTML, after all their mobile devices cope better with XML (that is correctly formatted XHTML) as it is, why would they support HTML over say XHTML-MP? Not to mention that Internet Explorer and Opera far surpass Firefox, Safari and Chrome as far as web-browsers are concerned. Why anyone would use Chrome is beyond me, just look why makes it: Google, and the one thing Google doesn’t ever want to do is block internet advertising, so you can be sure of surfing the web while swimming through adsense when using Chrome, good luck with that.

      BTW: Apple complaining that another product is propriety = huge hypocrisy you are quite right. At least flash does something useful: implements features, via plugin, not available using HTML 4.01.

  • Hingie says:

    Wrong.
    1. CRAP. Apple is on the standards committee for html5, everyone knows that.
    2. BULL. Apple and google mobile devices ship with low power implementations of html5, and cocoa is apple’s coding environment which has nothing to do with xml.
    3. GOOSE. Microsoftcock’s internet blunderer is far inferior to Safari. everybody knows that safari is the fastest browser. it shits all over everything in terms of speed, user experience and aesthetic.
    5. INCORRECT.

    youre living in a bygone era Mr Spaniel, and you’ll

  • Hingie says:

    Your blog sucks! it didn’t even give me a chance to finish my post!
    Time to get a Mac man.

Make a Comment

Hey! Pay Attention: