xhtml 1.0: marking up a new dawn

Alter width of article: Default / Full width

October, 2000.
By Molly E. Holzschlag. (Link to original article.)

Getting familiar—and getting started—with the new standard

Still writing your documents in HTML? If you are, you're not complying with current standards. On January 26, 2000, XHTML 1.0 became a recommendation by the World Wide Web Consortium (W3C). HTML, according to the W3C, is no longer the Web markup standard. Instead, XHTML 1.0 has replaced our old favorite, marking up the dawn of a new and exciting time in communications technology.

So what exactly is XHTML 1.0 and what does it mean to the Web developer? I'll start with the W3C's description: XHTML 1.0 is a reformulation of HTML as an XML application. This means that if you're authoring a document in XHTML 1.0, you are applying the rules and concepts inherent to XML to your Web markup. The dangling question naturally is: Can XHTML 1.0 be used to mark up my Web documents today? The answer is a resounding "yes!" All you need to do is learn how to structure documents properly, choose the correct document type definition (DTD) for your needs, and learn a few new ways of managing your code development.

Just how does XHTML 1.0 manage to be so ready to go? Well, as you write your documents, you'll see that it uses familiar HTML as its vocabulary. With some minor shifts in approach, but major shifts in thinking, XHTML 1.0 enables Web authors to code to the standards and begin shifting their perspectives in terms of future growth and change.

Why do we need another markup language?

HTML works pretty well. Granted, we've been challenged to come up with cross-browser, cross-platform solutions that really work. And in the process of bringing the Web's evolution from its nascent form in the early 90s to the vibrant, active Web we know today has meant straining, breaking or even making up new HTML rules as we went along.

Developers who have studied HTML 4.0 principles know that a definitive goal of improving HTML practices had been set forth by the time the HTML 4.0 standard came into being. Some of the primary concerns of HTML 4.0 involved:

These principles all exist in XHTML 1.0, but they have been combined with concepts from XML that help advance our markup beyond just strengthening its basic syntax. The goals of XHTML 1.0 are many, but include the following:

Perhaps the most compelling argument for adopting XHTML 1.0 is that developers—especially those who are self-taught in HTML or rely on visual design tools to achieve their goals—can easily move into other XML applications by studying the standard. They can then begin to see the power of XML and extensibility. XHTML 1.0 makes the territory of XML and its applications less daunting because the path is familiar: HTML vocabulary with some new structural and syntactical methods.

By using familiar language with some new concepts, it is easier to transition into less familiar territories. For example, knowledge of XHTML 1.0 can simplify the transition to upcoming XHTML versions and related XML technologies for wireless and other applications, such as WML (Wireless Markup Language), SMIL (synchronized multimedia language), and SVG (Scalable Vector Graphics).

You got to have roots

Looking at the roots of XHTML is helpful in understanding the rationale for XHTML and the rules that guide it. Both XML and HTML have common roots in SGML, the Standardized General Markup Language. It is important to know that SGML is not a language per se. It is what is known as a metalanguage—a language that contains rules from which other languages are developed.

XML, like its parent SGML, is also a metalanguage. As such, its rules are used to create XML applications. XHTML, then, is an XML application that uses another SGML language, HTML, as its vocabulary.

If the relationship seems complex, that's because in a way, it is. SGML begat HTML first, then XML. When the concerns and limitations of HTML were examined, it became apparent that XML's rules could help HTML mature into a markup language that would help transition developers out of those limitations.

First, the requirements

In order for an XHTML 1.0 document to be true to its metalanguage (XML), there are several requirements and rules that you must consider. They are as follows:

The first step in achieving these goals is to structure XHTML 1.0 documents properly. You'll begin by adding the proper declarations and document information.

Document declarations, types, and namespaces

An XHTML 1.0 document may contain several structural elements in order to be considered correct: an XML declaration, a DOCTYPE declaration, and the inclusion of a namespace. The XML declaration allows authors to declare their documents as XML, and include the encoding that is being employed by the document:

<?xml version="1.0" encoding="UTF-8"?>

Using this declaration is recommended but not required, as mentioned earlier. Part of the reason it is not required is that some browsers including IE 4.5 for Mac, and Netscape 4.0 for Windows will display XHTML pages inappropriately if it is used. So, most XHTML 1.0 authors interested in the best interoperability leave it out. However, since the encoding information is important in many instances—particularly when working with international documents—if you don't use the XML declaration, you are encouraged to add the encoding in a meta tag (shown later in Listing 2).

Beneath the XML document declaration—or directly at the top of the document, should you choose not to use it—you must place the DOCTYPE declaration. DOCTYPE allows an author to declare the type of document in use. In this example, the document type is XHTML 1.0 and the specific XHTML 1.0 DTD to which the document is to conform is strict.

There are only three DTDs available in XHTML 1.0. They carry over from HTML 4.0, and are as follows:

Strict:
Strict follows the most stringent rules of XHTML. Only current elements, attributes, and character entities are allowed in documents written in this type. Elements such as font or center, that were deprecated in HTML 4.0, are not allowed. Obsolete elements are also not allowed. The Strict declaration appears as follows:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

Transitional:
A transitional XHTML 1.0 document is more lenient, allowing the author to use deprecated as well as current methods. You can use font or center, or any other deprecated markup in a transitional document—so long as the document itself is properly marked as such. No obsolete elements should be used. If you want to write a transitional document in XHTML 1.0, you'll include the following declaration:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

Frameset:
The frameset DTD is reserved only for frameset documents. A frameset document conforming to this DTD can use either strict or transitional markup. To create a frameset document in XHTML 1.0, include this DOCTYPE at the top of your document:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">

Once you've decided whether to use an XML declaration, and you've added a DOCTYPE declaration defining the markup rules to which you're going to conform, you'll need to add an HTML root to the document and place the XHTML namespace accordingly:

<html xmlns="http://www.w3.org/1999/xhtml">

At this point, you'll want to add necessary structural elements such as head, title, and body. Listing 1 shows an XHTML 1.0 transitional document shell with the XML declaration included. In Listing 2, you'll see a transitional document shell without the XML declaration, but with a meta tag declaring the character set in use.

Listing 1: A Transitional DTD conforming XHTML 1.0 document with the XML declaration

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Transitional Document with XML Declaration</title>

</head>
<body>

</body>
</html>

In Listing 2, you will see a transitional document shell without the XML declaration, but with a meta tag declaring the character set in use.

Listing 2: A Transitional DTD conforming XHTML 1.0 document without the XML declaration

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Transitional Document without XML Declaration</title>
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8"/>

</head>
<body>

</body>
</html>

Syntax concerns

Once an XHTML document contains the necessary declarations and structural information, you can examine the syntax changes resulting from XML's influences on Web markup. These syntax changes include case awareness, well-formed tag elements, empty and non-empty elements, and the use of quotation marks.

Case

As you know, HTML is not case sensitive. This means that HTML elements and attributes names can be in upper, lower, or mixed case. So, you can have:

<body background="my.gif">

or

<BODY BACKGROUND="my.gif">

or even

<BoDy background="my.gif">

All of these examples mean the same thing. On the other hand, XML is case sensitive. Thus, XHTML is case-specific. In XHTML 1.0, all elements and attribute names must be written in lower case:

<body background="my.gif">

Other than element and attribute names, nothing else conforms to XHTML 1.0. Note that attribute values, such as "my.gif", can be in mixed case. This is especially true in instances where the files are on servers with case-sensitive file systems, or you're using mixed-case code in applications such as those written in Microsoft's Active Server Pages (ASP), ASP+, or ColdFusion.

Well-formedness

While many HTML browsers are quite forgiving, many HTML tools don't conform to standards. As such, some authors have learned bad habits such as improper nesting of tags. The following example may work in many browsers:

<b><i>Welcome to MySite.Com</b></i>

It will display as both bold and italic in a forgiving browser. But, if you take a pencil and draw an arc from the opening bold tag to its closing companion, and then from the opening italic tag to its closing companion, you'll see that the lines of the arcs intersect. This demonstrates improper nesting of tags, and is considered poorly formed.

In XHTML 1.0, such poorly formed markup is unacceptable. The concept of well-formedness must be adhered to in that every element must nest appropriately. The XHTML 1.0 equivalent of the prior sample is:

<b><i>Welcome to MySite.Com</i></b>

Draw the arcs now, and you'll see that they do not intersect. These tags are placed in the proper sequence, and are considered to be well-formed.

Non-empty and empty elements

A non-empty element is one that contains an element and some content:

<p>This is the content within a non-empty element.</p>

Whereas an empty element is one that has no content, just the element and its attributes, such as <hr>, <br>, and <img>.

XML rules indicate that empty and non-empty elements must be properly closed. In HTML, you've seen that non-empty elements often have optional closing tags. I could write the paragraph above as follows:

<p>This is the content within a non-empty element.

In HTML, this would be considered correct. XHTML 1.0 demands that non-empty elements are properly closed. Another example of this would be the <li> (list item) element. In HTML, you could have:

<li>The first item in my list.
<li>The second item in my list.

or

<li>The first item in my list.</li>
<li>The second item in my list.</li>

In XHTML 1.0, only the latter method is allowed.

Empty elements are terminated in XML with a slash. So <br> becomes <br/>. Due to problems some browsers accustomed to interpreting HTML have with this method, a workaround has been introduced, adding a space before the slash: <br />.

Here's an XHTML example of the image element, which is an empty element:

<img src="my.gif" height="55" width="25" border="0" alt="picture of me" />

Other empty elements of note are meta and link.

Quotes

Quotation marks in HTML are arbitrary in that you can use or not use them around attribute values without running into too much trouble. There's no rule that says that leaving values unquoted is illegal. The following is perfectly acceptable in HTML:

<table border=0 width="90%" cellpadding=10 cellspacing="10">

Despite the fact that some attribute values are quoted, and others are not, browsers will render this markup just fine. However, if you want to conform to XHTML 1.0, you'll have to quote all of your attribute values:

<table border="0" width="90%" cellpadding="10" cellspacing="10">

As you can see, none of these changes are monumental. A bit pesky, yes, but if you begin to employ this approach, you'll find your markup is a whole lot more consistent. That consistency is part of what makes XHTML 1.0 so attractive—it provides a strong foundation upon which to build future constructs.

Future of XHTML

If XHTML is so easy to use, then why is it taking so long to be adopted? This is a question that many standards-oriented people are asking. Part of the problem may be poor press—not too many people know about XHTML 1.0. And even if they've heard about it, they may not realize how easily it can be put to use today.

Add to this the fact that current software tools for HTML development such as Adobe GoLive, Macromedia Dreamweaver, Microsoft FrontPage and others do not have support for XHTML, and you have run into a serious concern for many Web authors who prefer these tools, or must use them in a work environment.

But despite these difficulties, XHTML 1.0 is marching on. In fact, the next version, XHTML 1.1, has already been fairly well fleshed out and contains some new and different concepts for the Web markup author. Modularization—the act of breaking the language down into discrete modules—is a primary part of XHTML 1.1. Also, more XML-like advantages are coming into play. For example, the ability to write your own DTD for an XHTML document or use a schema will truly bring extensibility to the game.

XHTML 1.0 is the current Web markup standard. Those who are not using it should at the very least give it a good try. The growth that is occurring in other areas of XML-related technologies—particularly in the wireless realm—is strong and convincing proof that the more flexible you can become as a markup author, the more prospects you will have. XHTML 1.0 is the perfect way to start expanding your horizons. It is familiar enough to make sense, and powerful enough to help you create stable, interoperable Web sites that work today and are prepared for the exciting opportunities of tomorrow.


Resources

Copyright Dunstan Orchard