Web Publishing
2.0 Upgrade
This semi-tome documents the upgrade of my published Web to W3C-recommended standards.
When I began the upgrade, my initial objective seemed innocent enough: Upgrade my HTML documents to XHTML. After only cursory research, XHTML seemed an imperative.
Little did I realize what that objective entailed. 8 months later (ie, July 2005 to March 2006), the upgrade is finally near completion. The sole remaining task is updating this document.
Why 8 months for the upgrade? These reasons:
1) The scope of the upgrade kept expanding.
And, necessarily so. A Web page can be W3C-validated as XHTML, yet not conform to all W3C-published standards. There are standards related to but separate from XHTML that impact other fundamentals ranging from the high-level (eg, Cascading Style Sheets & JavaScript) to the low-level (eg, the characters used for single & double quotation marks). Discovering what these standards were & complying with them was a decidedly non-trivial task that consumed months.
For more info, please see this entire document.
2) Documenting the upgrade became my central focus.
I spent more time writing & revising the Web Publishing series of documents (ie, this document plus the preceding Web Publishing Intro & the following Directory Structure, Word-to-Web Styles, & Various & Sundry documents) than I spent doing the technical. The way I do it, writing vets the technical; ie, if I can't express the technical clearly & simply, my grasp of the technical is deficient.
For more info, please see the Initial Goals & Methodologies For 2.0 section, Documentation methodology sub-section.
3) I went beyond W3C standards.
For the upgrade, I revisited everything relating to my Web site. After only cursory inspection, I concluded that version 1.0 of my Web site sucked. Since I'm an archetypal obsessive, I fixed 1.0's flaws.
For more info, please see the Web Publishing series of documents for the general, & the following Standards section for detail.
4) I decided to test-upgrade this document to XML.
While XHTML can be justified on its own merits, the W3C intended XHTML to be a transition technology between HTML & XML. XML is the real goal, not XHTML. Since so, I created a pure XML version of this document. Doing so mandated implementing standards that go beyond XHTML.
For more info, please see the following Final Design For 2.0 section, Utopian Marxists vs Supply Sergeants & XML: The real skinny sub-sections.
5) I realized my Web Publishing scheme could be an alternative to WYSIWYG software like Dreamweaver or Frontpage.
Since so, I spent extra time gussying it up. For more info, please see the My Web Publishing Scheme section of the Editing the Word HTML document.
Ergo, 8 months later, 2.0 is golden, & a wrap.
To be able to understand this document, there is a prerequisite: You should understand basic HTML. If you don't, I offer for your edification the following ruthlessly simple but integral & complete HTML document. If you understand it, then you pretty much satisfy the prerequisites.
The URL's in the following document link to an HTML code tutorial written by, well, HTML Code Tutorial. Their explanations are nice & concise. For the overall format of an HTML document, start here.
<html>
<head>
<title>My friends call me Janey. You can call me an XHTML'er.</title>
</head>
<body>
<p>I do the Web right, OK? Any problems with that?</p>
</body>
</html>
Peppered throughout this document are links to articles that taught me what I needed to know to do the upgrade. These links total more than a hundred. The articles cited were culled from the many hundreds of articles I researched.
My research was extensive. At first, the ‘good’ articles I found raised more questions than answers. Finding just the ‘right’ article with the right answers was often maddeningly difficult. Many times, an article I perused could have been written in Sanskrit for what I understood of it.
Functionally, the research became a self-education process. Since I was unknowledgeable when I began the upgrade, if the truth be told, I probably perused thousands of articles.
Swamped with what seemed like terabytes of information both good & bad, I became manic about adding ‘good’ articles to my browser's (ie, Microsoft Internet Explorer) Favorites list. As the list grew and became unmanageable, I reorganized it many times. After some churning, I ultimately created 2 separate lists. Both lists contain the identically same articles, but one list was ordered or sorted by site, & the other by topic. The by-topic list became particularly useful since I could find specific articles in it more easily than in the by-site list.
In any case, there is a point to this discussion. These 2 golden lists of golden articles comprise a kind of a bibliography, but not the usual type of bibliography; rather, it is Web-centric. How so? Well, you can download & import these 2 lists into your own browser's Favorites or Bookmarks. If you do, you will have instant access to the same golden articles I found. The Bibliography section at the end of this documents provides further details here.
When used as nouns, ‘HTML’ and ‘XHTML ’ have discrete meanings. When used as adjectives, they may or may not be synonyms:
Referencing a document or Web page. A Web page is either an HTML document or an XHTML document, but not both.
Referencing a Markup Language. HTML and XHTML are distinct markup languages.
Referencing everything else. In all other cases, ‘HTML’ & ‘XHTML’ can be considered synonyms, but ‘HTML’ is my adjective of choice: HTML tags, HTML elements, HTML tables, HTML named colors, HTML format, yadda.
Purists may object, but I use ‘source’ and ‘code’ as synonyms for ‘markup’; eg, “Following is example XHTML source”. I also say, “When you code HTML…”, even though only programming languages can be coded & neither HTML nor XHTML are programming languages.
In all of reality, the only thing that's pure is a baby's first smile.
A font-formatting note. HTML or XHTML markup is displayed using the Courier New font. Windows file names, Word menu item names, Word or Web style names, Cascading Style Sheets property names or values, & other items of technical argot are displayed using the Georgia font.
5 years ago when I decided to create a personal Web site, my goals were modest:
To learn what I needed to learn.
To learn it.
Initially, I decided I needed to learn HTML & Java: HTML since the Web is HTML based, & Java for its robust programming capabilities & for its actualization of a GUI-based Web.
Along the way, I used WYSIWYG editors (eg, Microsoft FrontPage) to create Web pages, but concluded that no self-respecting software professional would dare to claim authorship for the verbose, even morphidite, HTML that FrontPage generated.
Eventually, I decided that Java was overly robust & complex, & that other then-extant software technologies such as Cascading Style Sheets (CSS) & JavaScript would do just as well.
I used Microsoft Word & Excel to create the documents, tables, & graphs that later would be uploaded to the Web. Word can save documents & Excel can save spreadsheets in HTML format. These saved documents & spreadsheets can be uploaded to the Web, & will run as is.
However, both Word & Excel suffer the same fatal flaw that FrontPage has: The HTML markup they generate is unacceptably verbose & overly hieroglyphic. Hence, I realized I would need to edit the generated HTML, remove the unnecessary verbiage, & replace it with something cleaner.
So, my initial goals for 1.0 became:
To abide by the principles of good, sound Web design;
To learn HTML, CSS, & JavaScript; &
To mesh Word & Excel with these 3 Web-based software technologies.
The last point is noteworthy. This ‘meshing’ involved the creation of a specific set of Word styles, & a corresponding set of Web styles.
For example, every Word document has a Normal style that defines the formatting characteristics of a typical or default Word paragraph. After customizing this Word style, I created a corresponding CSS-defined Web style named p.Normal that duplicated all formatting characteristics of the Word style.
I did the same for other Word styles, 25 in all: Headings, indented text, bulleted & numbered lists, hyperlinks, etc. Thus, I duplicated the look & feel of Word documents on the Web. Pretty cool at the time.
What resulted was version 1.0 of my Web site published in 2001.
Over the span of 4 years as my Web site expanded, I added applications (eg, Economics, 2004 Election), but continued to use the original core technical elements.
Eventually, however, my urge to craft a respectable technical software product overrode my urge to write. After all, I've been a software professional for 38 years. Resultantly, I decided to upgrade my Web site to use current software technologies.
During my research, I found 3 articles whose recommendations helped frame the scope of the upgrade:
1) The wide-ranging Web Page Development: Best Practices article by the Apple Development Connection;
2) The pithy but dense Web Site Development Information article by Kode Host; &
3) The Benefits of Web Standards article/slide show by Webboy.
Their key points: Engage in “Best Practices”, develop & use standards, & employ current software technology. I accepted their sage advice, & used it as a blueprint & template for the upgrade.
Now, detail.
For years, I had a nagging suspicion that my implementation of Cascading Style Sheets was both inelegant & primitive. After only brief research, I confirmed this suspicion. Redesigning style sheets became a given.
Another fatal attractor was XHTML. Again, after only brief research, upgrading to XHTML became a given.
I also succumbed to my OCD mania for uniformity & standards, at all levels. Examples are:
Single character level
The following 2 lines of markup are identical; ie, browsers parse them exactly the same:
<script type='text/javascript'>
<script type="text/javascript">
I used the single-quote syntax in some Web pages, & the double-quote syntax in others. Not good. 2.0 would standardize on double quotes.
Intermediate level
This site abounds with HTML tables. In the beginning, I created simple tables with simple syntax, using HTML tags only & none of the facilities of CSS. Eventually, I introduced CSS elements to define the formatting characteristics of a table cell: Alignment of text; background colors; border colors, line types, & widths; & font faces, colors, sizes, & weights. Thus, as tables evolved, their syntax diverged & differed. Not good. 2.0 would standardize the syntax used for tables.
Web page level
In HTML, the order of the tags in the <head> section is unimportant:
<head>
<title>
<meta>
<link>
<style>
</head>
ie, HTML has no standards here. Not good. 2.0 would use a standard order for these tags. More importantly, every Web page would be structured using a strict template; ie, the same tags or sections of markup would always be in the same location relative to everything else.
Finally, since the above 3 tasks involved drastic revisions, it seemed wise to also revisit the overall design of the Web site: General page layouts, screen widths, menu navigation system, background, colors, use of images & graphs, etc.
So, my initial goals for 2.0 became:
1) To redesign Cascading Style Sheets;
2) To upgrade to XHTML;
3) To standardize; &
4) To tweak the overall design of my Web site.
The methodology I used during the upgrade was my usual: Top-down design, coupled with bottom-up implementation. However, I adopted a novel methodology, which is this: This document would not only be a kind of Design Spec, but also be a kind of serial-in-time Log of the upgrade as it proceeded, plus a kind of Tutorial.
Design Spec, Log, & Tutorial. Why do this? Because it's a damned good idea, that's why. Explaining what the next 3 major sections of this document are will affirm the merit of this document's tripartite structure. These sections are:
1) A Research section: Trees to forest
If you attempt what I did, you would research, too. Soon, you would become lost as I did in the hyperlink trees. Only after you began understanding the relationship of everything to everything else would you begin to see the forest.
For example, HTML & CSS are intertwined. By its lonesome, CSS has no value. Only when CSS is considered in relationship to HTML does its value become evident.
Dilemmas of this type always seem to rear their ugly heads with complex, multi-faceted systems; ie, you can't understand topic A unless you understand topic B, but you can't understand topic B unless you understand topic A.
Is there a solution here? Yes. Explain both the details of a topic & its relationships to other topics concurrently; ie, discuss it as both a tree & a member of the forest.
The Research section shortcuts the forest appearing. Topics are presented in small chunks. As new topics are discussed, they are based upon the foregoing. Topics proceed smoothly one to the next. Despite yourself, you'll find yourself learning in an entirely natural way.
The Research section is the serial-by-time Log. It documents my journey from ignorance to understanding. Not included, however, are the false steps, dead ends, & long journeys to nowhere. What remains is golden.
2) A Final Design section: Forest to trees
By the end of the Research section, you will have a good, general understanding of Web Publishing's multiple facets, & a good, general understanding of their relationships. Since so, the Final Design section can address individual topics without the need to explain where those topics fit within a larger context. Additionally, the order of the topics in the Final Design section can proceed from low-level to high-level. Thus, by the end of the Final Design section, you will have a good, detailed understanding of Web Publishing's multiple facets, & a good, detailed understanding of their relationships.
The Final Design section is the Design Spec.
Considered as a whole, the 2 different sections, & the 2 different approaches to the exact same subject matter used in each, yield this:
Log + Design Spec = Tutorial.
3) A Standards section: The primal, irreducible ethos of the ecosystem
Let me requote the 1st sentence of this document:
This semi-tome documents the upgrade of my published Web to W3C-recommended standards.
Sure, Web technologies like HTML, XHTML, XML, CSS, & JavaScript are vital, as are topics like Web site fonts, colors, screen sizes, general page layouts, backgrounds, images, HTML tables, browsers, yadda. However, none of these are as vital as standards.
As far as I'm concerned, everything is subservient to standards. Standards are it. Without taut, clean, & simple standards, a published Web is shit.
Standards are discussed in both the Research & Final Design sections, but codified in the Standards section. When you finish reading the Standards section, the final tumbler should go ka-chunk, & the vault door will open on the World Wide Web.
To recap, my goals for 2.0 were:
1) To redesign Cascading Style Sheets;
2) To upgrade to XHTML;
3) To standardize; &
4) To tweak the overall design of my Web site.
This document addresses the first 3 goals. The last-in-this-Web-Publishing-series Various & Sundry document discusses the last goal.
In 1.0, I created synchronized Word & Web styles that governed document formatting characteristics: Body text indents, hanging body text indents, headings & titles, hyperlinks, bulleted & numbered lists, & paragraphs. However, the Web styles were nothing more than a direct translation of their corresponding Word styles. I paid no heed to the internal efficiency of these Web styles, or to their conformance to good Web design practices.
2.0 would fix this brute force, inelegant implementation.
Early on, I established 3 goals for the redesign:
1) Implement CSS Inheritance;
2) Use CSS to completely separate content from presentation; &
3) Do whatever else my research indicated I should do.
The first 2 goals involved near-routine technical tasks. Goal #3, however, grew in scope as my research proceeded. Such is life.
In 1.0, I associated styles only with paragraphs (ie, <p>) & not with any higher-level (eg, <body>, <div>) or lower-level (eg, inline elements such as <span>) HTML tags. Thus, 1.0's styles did not inherit formatting characteristics from other styles, while 2.0's styles do.
Below is an example of a skeletal 2.0 Web page:.
<html>
<head>
<title>...Title...</title>
</head>
<body class="Body">
<div class="Text">
<p class="Normal">...<span class="Serif">Serif</span>...</p>
</div>
<div class="Images">
...
</div>
</body>
</html>
The <body>, <div> and <p> elements have associated CSS-defined styles identified by the class declaration. There is an additional style associated with the <span> tag.
These 4 elements exist in a nested hierarchy: body to div to p(aragraph) to span. In CSS, lower level or child elements (eg, the 2 div's) can inherit the CSS properties assigned to higher level or parent elements (eg, body). In fact, the great-grandchild element associated with the span tag can inherit the CSS properties of all its progenitors: p, div, and body.
That's inheritance. For more info, please see the W3C's Assigning Property Values, Cascading, & Inheritance document.
I created a Web page based upon the above skeleton. Styles are defined in the <style> section. The actual XHTML markup follows the document's text & image. To view this page, please see the CSS Inheritance Example document.
Being an example, the above document is an exception. For 2.0, styles are defined in separate, external style sheets, & not within the document. More later.
To view & compare the master style sheet named Web.css for 1.0 and 2.0 & gain a sense of what inheritance has wrought, please click the preceding URLs.
Note: 2.0's Web.css master style sheet is discussed at length in Web Publishing's Word-to-Web Styles document.
OK. Inheritance incorporated in Cascading Style Sheets. Now, to vet it.
For 2.0, I decided to segregate different types of XHTML markup in their own div's. So, I created 6 div's: Text, Tables, Images, Forms, Navigation, and Footer. These 6 div's seemed sufficient to encapsulate all major types of XHTML markup. However, while implementing, I found these 6 div's insufficient. Scattered here & there were short blocks of XHTML markup that seemed to require their own div's. Some blocks used the HTML <pre> tag to effect true vertical alignment of table-like text using the fixed-width Courier New font, while other blocks used other techniques to achieve the same objective. Since exceptions sully a good design, they had to go.
I realized these exceptions shared a common trait: Each set its own unique left margin. In my Web scheme, left & right margins serve the same purpose as a Word document's left & right margins; ie, they create white space, white space as in books, letters, reports, etc. Writing 101.
Ergo, I decided to revisit margins. In so doing, I thought, perhaps I could eliminate the exception div's.
Backwards step #1.
Since Word sets left & right margins in inches, I also set CSS-defined left & right margins in inches.
Top & bottom margins are a different matter; ie, there is no Web-based analog to Word's top & bottom of page margins. The closest analog is the white space/spacing before & after headings, paragraphs, images, etc. Since so, I decided to use a unit of measure relative to the font size: em's.
What's an em? From Wikipedia's Em (typography) article:
In current use, em usually means the typeface's body size, meaning the length from the lowest descender to the highest ascender, sometimes including height added by any diacritical marks. So, 1 em in a 16 pt typeface is 16 points.
Since 1 em is the height of the font being used, em's are great for top & bottom margins. Specifically, the height of blank lines used as top or bottom margins varies directly with the font size being used.
In 1.0, I established no standards for units of measure such as margins. Not good. 2.0 needed standards. After due consideration (ie, 1 or 2 seconds worth), I established this standard for 2.0:
Use inches for left & right margins;
Use em's for top & bottom margins; &
Use points for font sizes.
Almost immediately, I realized that using points as a unit of measure might not be suitable for the Web. After all, points are typographic units of measure, while the Web is a display & not a print medium.
Yes, Word uses points as its font size unit of measure, as did WordPerfect back in the early 80's when it was my word processor of choice. Using word processors for over 2 decades habituated me to points.
Usually, new standards last longer than 2 seconds. Not this time. I decided to revisit the use of points.
Backwards step #2.
The pixel is the unit of measure for display devices. Pixels have existed since time immemorial. Pixels will never die. Shouldn't font sizes be defined using pixels? Aren't pixels the only real choice?
The short answers to these 2 questions are “Perhaps not” & “Not really”. Some Web folks swear by em's, while others swear by points. Google “em pt px” to explore the extent of this controversy.
Even the above-cited Wikipedia article on Em (typography) weighs in:
(T)he W3C best practises recommendations within HTML & online markup now call for web pages to be based on scalable designs, using a relative unit of measurement (such as the em measurement), rather than a fixed one such as point size.
Hence, points suck, but so do pixels; ie, both pixels & points are fixed units of measure.
So, em's for font sizes it is, then. If the W3C recommends it, it must be so.
em is a relative unit of measure, but relative to what? Again citing Wikipedia:
In current use, em usually means the typeface's body size, meaning the length from the lowest descender to the highest ascender, sometimes including height added by any diacritical marks. So, 1 em in a 16 pt typeface is 16 points.
Thus, em's are relative to points. Do you see the possible conundrum here? The W3C states, “Never use a fixed unit of measure like points”, but then states, “Use a relative unit of measure like em's, a unit of measure relative to points”.
Actually, there is no conundrum as we shall see, but this controversy does seem to suggest that the ‘best’ unit of measure to use for font size is decidedly not a trivial issue. For example, here's Todd Fahmer:
The solution is for the designer to specify sizes not in any length system - relative or absolute - but in literals (keywords) whose actual values adapt intelligently to the baseline or "medium" value chosen by the user. Such "absolute size" schemes have the unique benefit of being both based on a user-chosen size & "node agnostic;" i.e., the document structure does not complicate the meaning of absolute sizes the way it complicates the meaning of, say, "75%". Furthermore, the actual values of literals can vary in more sophisticated ways than simple relative length units, taking font quality & available resolution into account.
After about 7 re-reads, Fahmer's seemingly tortured solution starts making sense. No matter. The only point worthy of note is that font size unit of measure does appears to be a highly contentious issue.
The W3C is the organization self-tasked to bring rigor & order to the Web.
OK, so the W3C recommends using a relative unit of measurement like em's. But, what should em's be relative to? The guru's answer is simple: The baseline font size set by the user.
In this scheme, the physical font size the user chooses is irrelevant since, by definition, this font size is 1 em or, alternately, 100%. Larger & smaller font sizes are then defined relative to it. Example larger font sizes are 1.5 em or 125%, & smaller sizes .83em or 75%.
Functionally, em's are simple & straightforward. The dictionary definition might be vague & abstract, but real world em's are not.
Time for backwards step #3. I decided to dig even deeper.
A matter of nomenclature. To eliminate any possible confusion, the word ‘dimensionless’ as used below applies to font sizes not defined in fixed units of measure like points or pixels.
Dimensionless fonts are supported in 3 areas: HTML, CSS, & Web browsers
1) Within HTML
HTML 3.2 assigns the numbers 1 thru 7 to font sizes:
<font size = "1">This will display small.
<font size = "4">This will display medium.
It also allows this variant:
<font size = "+1">This font is 1 size bigger.
<font size = "-2">This font is 2 sizes smaller.
2) Within Cascading Style Sheets
CSS has 7 keywords associated with font sizes: xx-small, x-small, small, medium, large, x-large, and xx-large. CSS also allows decrementing & incrementing font size via use of smaller or larger.
Interestingly, there are 7 font sizes in the HTML number-from-1-to-7 scheme, & 7 fonts sizes in the CSS xx-small-to-xx-large scheme. However, as Todd Fahmer explains, these schemes do not map to one another. Since so, it's no wonder font size is a contentious issue.
3) Browsers
Internet Explorer 6.0 allows the user to select font sizes from Largest to Smallest:

Mozilla Firefox 1.0.4 allows the user to increase or decrease font sizes, even to the point of illegibility:

In all these cases, font sizes are purely dimensionless; ie, it is impossible to determine the fixed font sizes browsers will actually assign.
I decided to scope 10 popular Web sites to determine how they set font faces & sizes; most specifically, whether or not font sizes were purely dimensionless. I selected the sites from Alexa's Top 100 Sites - English list. Selection criteria were informal: Sites should be general interest & represent a good cross-section; no ad-provider sites, etc.
I perused the Home Page & any external, linked Cascading Style Sheets to collect the information.
Following is the list in order, ranked by number of hits. Note that the current ranking may be different than the mid-July 2005 ranking below.
1) Yahoo - #1
Font face: Arial, Verdana
Font size: 84%/1.2em; 77%, 100%, 1%; -1, -2
Dimensionless? Yes.
2) Google - #3
Font face: Arial
Font size: 10pt, 12pt; -1, -2, +1
Dimensionless? No.
3) ebay - #5
Font face: Arial Narrow, Helvetica, Verdana; Courier New
Font size: 11px; -2, 1; x-small, medium, small
Dimensionless? Almost. 11px used only for text inside one button.
4) Microsoft - #6
Font face: Verdana, Arial, Tahoma
Font size: 70%, 95%, 100%; 1em
Dimensionless? Yes.
5) Amazon - #7
Font Face: Verdana, Arial, Helvetica; Times
Font Size: small, x-small, xx-small; 10px, 12px, 13px; .5em; -1
Dimensionless? No.
6) BBC - #11
Font Face: Verdana, Arial, Helvetica
Font Size: 1, 4, 2
Dimensionless? Yes.
7) CNN - #13
Font Face: Georgia; Arial, Helvetica, Verdana
Font Size: 12px, 20px, 11px; 1em
Dimensionless? No. In fact, font sizes are nearly pure non-dimensionless; ie, 1em is used only in scrolling news section.
8) MapQuest - #30
Font Face: Verdana, Arial, Helvetica
Font Size: 62.5%, 100%; 1.2em, 1em, 1.8em
Dimensionless. Yes.
9) Wikipedia - #37
Font Face: Times, Times New Roman
Font Size: 92%, 90%, 125%; smaller, larger; 11pt, 8pt; x-small. small; 1em, 0.8em
Dimensionless? Almost. The <body> style sets an 11pt font size, but the CSS markup allows the user to override this setting if the user has defined a separate style sheet. For more info, see below in the Final Design For 2.0 section, The Accessibility Issue sub-section.
10) New York Times - #59
Font Face: Arial, Helvetica; Times New Roman, Times
Font Size: 1, 3, +1; 11px, 10px, 12px; 82%, 125%, 78%
Dimensionless? No.
Crunchy. Let's explain the list:
1) Font Face
There are 5 font families: Sans-Serif, Serif, Monospace, Cursive, & Fantasy. For further info, please see the W3C Font Families article.
The above list specifies up to 3 fonts per font family. Semicolons separate different families. Fonts are specified in the order they occurred within the HTML or CSS.
Arial seems to be the most popular Sans-Serif font, with Verdana a close second. Times or Times New Roman are the most popular Serif fonts, with Georgia occurring once.
2) Font Size
The above list specifies up to 3 examples per fixed & dimensionless fonts sizes..
Fixed font sizes include:
Points
Pixels
Dimensionless fonts sizes include:
em's
Percentages
The HTML number-from-1-to-7 form
The HTML plus-or-minus-a-number form
The CSS xx-small to xx-large form
3) Whether or not the site uses pure dimensionless fonts:
Yahoo, Microsoft, the BBC, & MapQuest do.
For all practical purposes, eBay & Wikipedia do.
Google, Amazon, CNN, & the New York Times do not.
In the real world, dimensionless fonts -- or those defined using em's or percentages -- are not universally implemented. If Google, Amazon, CNN, & the New York Times use fixed units of measure to define font sizes, then perhaps using dimensionless fonts is not a foregone decision.
Backwards step #4?
Remember the standard I started with?
Use inches for left & right margins;
Use em's for top & bottom margins; &
Use points for font sizes.
Well, according to the widely cited The Amazing Em Unit & Other CSS Best Practices article I ran across during my research, the standard for length units should be:
Use em's for left & right margins;
Use em's for top & bottom margins; &
Use em's for font sizes.
The key cite from the above article is this:
This is the time to pull out the obsolete absolute length units: Only use absolute length units when the physical characteristics of the output medium are known.
Wow. Absolute length units are obsolete? Has postmodernism infested Cascading Style Sheets, too?
Not really, but the authors cogently argue 2 major points:
1) The Web should accommodate the visually impaired; ie, make the Web accessible to the handicapped.
Accessibility is the raison d'etre for dimensionless fonts. The visually impaired may set 20 points as their baseline font size. If they do, the article recommends respecting it.
A good single-source here is (Warning: PDF file) Best Practices for Web Accessibility & Design by Dr. Alan Foley of North Carolina State University & Bob Regan of Macromedia®. In it, they discuss “designing for people with disabilities”.
Remember this excerpt from the Wikipedia Em (typography) article that led to a long discussion on dimensionless fonts?
(T)he W3C best practises recommendations within HTML & online markup now call for web pages to be based on scalable designs, using a relative unit of measurement (such as the em measurement), rather than a fixed one such as point size.
This recommendation is strictly based upon the accessibility issue. For proof, please peruse the September 2000 W3C CSS Techniques for Web Content Accessibility Guidelines 1.0 document.
Accessibility is a big deal. Articles on the topic abound. For our purposes, accessibility & dimensionless fonts seem to be 2 sides of the same Moebius strip.
2) The Web should accommodate alternate display devices.
eg, PDAs, cell phones, & the eventual, future heads-up display (as with F-22 & F-35 fighters) projected on the inside of sunglasses. These alternate display devices are smaller than the typical 15” or so computer display. These smaller display devices almost mandate the use of relative units of measure.
Well, zounds. A lot to consider.
Implementing CSS inheritance itself ended up being a straightforward task. However, it became obvious during my research on inheritance that units of measure (ie, relative to margins & font sizes) was a complex issue that needed to be specifically addressed, & that my Word-centric perspective needed to become Web-centric.
Time & again, my research compelled me to reexamine elementals like units of measure. Said reexamination always proved to be necessary & worthwhile. Attending to basics became habitual.
My final choice for units of measure will be discussed in the Final Design For 2.0 section, Units of Measure sub-section.
When HTML was just HTML & Cascading Style Sheets were the future, Web pages mixed content & presentation; ie, almost every HTML tag effected the way content was presented.
The W3C developed their Cascading Style Sheets, Level 1 recommendation in 1996 to allow presentation elements to be separated from content. CSS made sense since a single style sheet could conceivably control the presentation elements for an entire Web site.
Perhaps the main impetus behind CSS was the venerable <font> tag -- more properly, its misuse. Since the Web was initially mainly a text-based medium, text ruled. Changing the text's font size, face or family, color, etc. involved adding another <font> tag. <font> tags had a tendency to proliferate like kudzu vine at an abandoned fertilizer factory. If the Web author decided to change, say, the basic font face for an entire Web site, many <font> tags had to be changed.
CSS solved that problem. If implemented correctly, changing the font face in a single style sheet would change it for the entire Web site.
“Separate content from presentation” is the raison d'etre of CSS.
The original CSS Level 1 recommendation harbinged changes to HTML itself; ie, CSS was designed not only to supplement HTML but also to supplant many HTML elements. As one good example, the HTML font tag is deprecated; ie, discontinue its use since it's going away. Other HTML tags & attributes of tags are deprecated, too. Peruse the preceding link for a list of these deprecated tags & attributes.
The December 1999 HTML 4.01 Specification formally deprecated these tags & attributes.
Using deprecated elements is akin to eating sour cream past its expiration date. Not a good idea.
Cascading Style Sheets do not automatically fix the font-tag-as-kudzu-vine proliferation mess. In fact, the problem can be just as egregious.
For instance, the following 2 examples of XHTML markup are functionally equivalent:
<font face="Arial" color="red">Wherefore art thou, Romeo?</font>
<span style="font-family: Arial; color: red">Wherefore art thou, Romeo?</span>
Example #1 uses the <font> tag, while example #2 uses CSS inline styles. However, this simple replacement does nothing to fix the proliferation mess; ie, <span style="font-family: Arial"> still defines a specific font just as <font face> does.
When CSS is used properly, the kudzu vine withers; eg:
<html>
<head>
<style>
.SansSerif {
font-family: Arial, Helvetica, Verdana, sans-serif;
color: red;}
</style>
</head>
<body>
...
<span class="SansSerif">Wherefore art thou, Romeo?</span>
...
<span class="SansSerif">Hey, Juliet! Could you get me a beer?</span>
...
<span class="SansSerif">Thouest suck, Romeo.</span>
</body>
</html>
The <style> section defines the CSS style of SansSerif. This style defines presentation elements; ie, CSS properties. These properties are assigned to 3 separate strings of text by the CSS class attribute. If any of the properties of this style are changed, they immediately apply to all text associated with this class.
Problem fixed: No presentation elements exist within the <body> of the HTML document.
As another example, this HTML markup:
<span style="margin-left: 0.6"> whatever </span>
should be replaced with:
<span class="LM06"> whatever </span>
where the LM06 style is defined as:
.LM06 {margin-left: 0.6";}
However, embedding CSS styles in an HTML document's <style> section (eg, the above example) itself defeats the purpose of style sheets. If, for example, the SansSerif style exists in multiple documents, changing this style's properties would involve changing all documents in which this style is defined & used.
There's an easy fix here: Use external, linked style sheets. If all HTML documents shared a common style sheet, then changing the properties for any style would involve changing only this one, singular, common style sheet.
For an explanation on the differences between external & embedded style sheets, please see Mark Allen Zehner's HTML Style Sheets article.
Let's check the 10 biggies again:
Do they use the deprecated <font> tag?
Where do they define CSS styles?
The results are noteworthy.
1) Yahoo - #1
Uses deprecated <font> tag on Home Page.
Does not use external, linked style sheets. As a result, defines font sizes on Home Page in <style> section.
2) Google - #3
Uses deprecated <font> tag on Home Page.
Does not use external, linked style sheets. As a result, defines font sizes on Home Page in <style> section.
3) ebay - #5
Uses deprecated <font> tag on Home Page.
Defines font sizes on Home Page in <style> section. Also defines font sizes in external, linked style sheets.
4) Microsoft - #6
5) Amazon - #7
Uses deprecated <font> tag on Home Page.
Does not use external, linked style sheets. As a result, defines font sizes on Home Page in <style> section. There is an external, linked style sheet, but this style sheet appears to belong to an ad server.
6) BBC - #11
Uses deprecated <font> tag on Home Page.
Defines font sizes on Home Page in <style> section. Also uses an external, linked style sheet, but defines font faces only & not font sizes in it.
7) CNN - #13
Defines font sizes on Home Page in <style> section. Also defines font faces & sizes in external, linked style sheets.
8) MapQuest - #30
9) Wikipedia - #37
Defines font sizes on Home Page in <style> section. Also defines font sizes in external, linked style sheets.
10) New York Times - #59
Uses deprecated <font> tags on Home Page.
Defines font sizes on Home page, but using the inline <span style="font ..."> tag. Also defines font sizes in external, linked style sheet.
So, what's the summary?
1) Whether or not the site uses the deprecated <font> tag:
Microsoft, CNN, MapQuest, & Wikipedia do not.
The rest do.
2) Whether or not the site defines styles only in external style sheets:
Microsoft & MapQuest do.
The rest do not.
Amazingly, dimensionless fonts are more widely implemented than external, linked style sheets. Why is this amazing? Well, the rationale for defining styles only in external, linked style sheets is purely technical, & is a more rudimentary imperative than dimensionless fonts; ie, external, linked style sheets inhabit the underwater portion of the iceberg, dimensionless fonts the visible part. However, dimensionless fonts also have a legal dimension; ie, a visually challenged individual could sue the Web site owner under terms of the Americans with Disability Act because the font size is too small. Fucking lawyers.
Fully implementing external, linked style sheets ended up being an onerous task. Why?
In 1.0, I mixed presentation with content.
Similar to the ‘bad’ example above, 1.0 defined other fonts this way:
<span style="font-family: Georgia; font-size: 12pt">In the US, or in the USSR?</span>
2.0 uses this syntax:
<span class="Serif">In the US, or in the USSR?</span>
Since my Web site has around 125 Web pages, & since each page had up to 50 occurrences of presentation elements being defined in the <body> of those pages, I had to make literally thousands of changes. Your applause is underwhelming.
2.0's master style sheet changed the methodology for creating HTML, in some cases radically.
As an example, 2.0 eschews bolding &, instead, highlights text with a yellow background, as a yellow magic marker would do. Thus, this HTML markup:
<b>Reality does not vet liberalism.</b>
becomes:
<span class="BoldYellow">Reality does not vet liberalism.</span>
This change is mostly trivial. Not so with HTML tables. The impact of CSS was rupturous. More below.
No matter. The benefits of completely separating content from presentation more than offset the effort. Already, new Web pages I've developed using 2.0's Cascading Style Sheets are quicker & easier to do. Plus, they look & display better, & are more uniform in appearance. I like it.
A standard can be formulated:
Define all presentation elements (ie, all styles) in external, linked style sheets. Never use the keyword style in an HTML document, either in the <head> or <body> sections.
The above rule should apply to all styles, & not just fonts. Defining styles in the <style> section of a document, or via use of the <span style= ...> tag defeats the purpose of Cascading Style Sheets, & seems just as ill considered as peppering HTML documents with the <font> tag.
I realized beforehand that the extensive research I planned to do for the upgrade would cause its scope to expand. However, I had no idea that some research forays would take me to the galaxy lying 2 galaxies the other side of Andromeda. No matter. The trip was great, but it feels good to be back home.
Everybody recommends using “Best Practices” for Web design, but exactly what constitutes “Best Practices” seems to depend upon the author doing the recommending; eg, utopian technical gurus vs real-world pragmatists.
However, a good single source here is the Web Style Guide, 2nd Edition by Patrick J. Lynch & Susan Horton. In it, the authors cover the gamut.
Alas, this near-book-long treatise is dated, being published March 2002. However, most of their recommendations are still golden. After all, the basics of the Web are still the basics of the Web.
I found the Site Elements subsection of Chapter 3: Site Design & Chapter 6: Editorial Style to be particularly worthwhile. Other sections, not; eg, this excerpt from Site Design: “Chunking” Information:
Most information on the World Wide Web is gathered in short reference documents that are intended to be read nonsequentially. This is particularly true of sites whose contents are mostly technical or administrative documents. Long before the Web was invented, technical writers discovered that readers appreciate short "chunks" of information that can be located & scanned quickly.
The same theme is echoed in this excerpt from the introduction to Page Design:
Graphic design creates visual logic & seeks an optimal balance between visual sensation & graphic information. Without the visual impact of shape, color, & contrast, pages are graphically uninteresting & will not motivate the viewer. Dense text documents without contrast & visual relief are also harder to read, particularly on the relatively low-resolution screens of personal computers. But without the depth & complexity of text, highly graphical pages risk disappointing the user by offering a poor balance of visual sensation, text information, & interactive hypermedia links.
This document violates both these rules, but intentionally so:
It attempts to synthesize all the “chunks” I found into a seamless whole, using a style eerily reminiscent of Web Style Guide, 2nd Edition itself. Thus, the chunk mode would be exactly unsuitable for this document.
The raison d'etre of my Web site is to prove via scholarly commentary & authoritative data that reality does not vet contemporary liberalism. As such, much of the site is pure text. Images consist mainly of charts & graphs. In various specific sections, tabular data abounds.
A notable exception is this Web Publishing section. The subsequent 3 documents in this series abound with images other than charts & graphs.
But, my site does violate the principle that pretty pictures matter. No problem. A visitor to my site must have a longer attention span than the average Web user, & must be able to ratiocinate beyond the “Bush is a Nazi” or “Bush lied” mentality & reflexive ideology.
Still, I did at least consider all their “Best Practices” recommendations before rejecting some. Besides, if everybody 100% embraced the same set of Best Practices, the Web would lose its idiosyncratic, eclectic charm.
As I researched, as I revised (and revised) this document, & as I vetted the revised Web.css master style sheet by upgrading & testing CSS-feature-rich documents (eg, the next 2 in this Web Publishing series, Directory Structure & Word-to-Web Styles), I continued refining Web.css. These revisions can be grouped into the following categories:
1) Margins
Amazingly, setting clean, simple, & consistently-defined left margins was the most difficult task in 2.0, not only for those styles affecting paragraphs (ie, paragraphs themselves, indents, hanging indents, bulleted & numbered lists, alternate headings, etc), but also for styles whose primary intent was to simulate Word tabs on the Web. (Note: These styles are further discussed in the Styles that simulate Word tabs section of the Word-to-Web Styles document).
Before the upgrade, I never gave units of measure a second thought. During the upgrade, pondering units of measure triggered at least trillions of neuronal firings.
Fortunately, the scheme I finally devised for left margins proved to be simple. More importantly, it integrates so seamlessly in my overall Web scheme that I haven't had to revisit the subject again.
2) CSS applied to HTML tags
I unreservedly embraced the radical simplifications resulting from redefining properties of HTML tags using CSS; eg:
hr {
color: Maroon;}
img {
border: none;}
table {
margin-left: auto;
margin-right: auto;
table-layout: auto;
text-align: left;
font-size: 12px;}
td {
padding-left: 3px;
padding-right: 3px;
white-space: nowrap;}
Doing so in style sheets does it in one swell foop; ie, these properties do not need to be set for every occurrence of these tags. What a concept.
Note: The details of Cascading Style Sheets are not explained in this document; eg, the above 4 definitions. Doing so is the province of the Word-to-Web Styles document. In it, examples like the above are always followed by an explanation of each CSS property (eg, color, white-space).
3) Deprecated HTML tags & attributes
2.0's style sheets functionally replace all deprecated HTML tags or attributes. For example, the venerable HTML <u> tag is deprecated, with the CSS U style being its replacement.
A few other deprecated elements bit me:
Tags: <center>
Attributes: background in <body>, border in <img>, & width in <td>
Of course, the <font> tag munched ravenously. The width attribute for table cells did, too; please see immediately below.
4) Tables for data: Yes!
Ah! Tables using CSS.
Since I hand-code all HTML tables, creating them is hard work. Since my Web site abounds with tables, anything that would make their creation less onerous would be a grief-counselor-send.
When I began creating tables in 1.0, I used only HTML markup & not any CSS facilities. Since plain vanilla HTML seems to demand assigning fixed widths to all table columns, I assumed I needed to assign these fixed widths via the width attribute of the <td> (ie, table data) tag. Unfortunately, this attribute is deprecated, but there is a straightforward replacement: The undeprecated <colgroup> & <col> tags; ie, they can do what the width attribute does.
Realizing I needed to purge the width attribute, I successfully converted a few tables using colgroup and col.
However, a happy event happened during my research: An arch-grief-counselor-angel paid me a visit & made me aware of a far better alternative.
The CSS 2.1 & earlier specs incorporate the table-layout property. This property can assume 2 values: fixed and auto. In the automatic layout mode, the browser determines the optimal -- and minimum -- width for all table columns. Here's how it works.
The browser calculates the width of each column & the total table width via an algorithm:
Parse all rows of the table to determine the maximum width of each column.
Calculate the table width as the sum of these maximum column widths.
Display the table.
The browser makes 2 passes: The 1st calculating the widths, & the 2nd displaying the table.
The upsides to this automatic table layout mode are:
Widths never need to be specified since the browser determines them.
The calculated table width is the absolute minimum table width.
The downside to this layout mode is rendering speed. Since the browser makes 2 passes for each table, they display more slowly. There may be other downsides, but, in my opinion, the upsides prevail.
With a ruthlessness spawned by memories of tedium, I decided to implement this automatic layout mode:
The process of manually determining a good minimum width for all table columns is onerous, & decidedly trial-&-error. Since the sum of all column widths must be 100%, changing one column's width necessitates changing the widths of another column or other columns. When so changed, other columns that displayed OK before the change might become too narrow or too wide. It sometimes takes tens of such iterations to get the individual column widths right.
Despite concerted efforts, the white space on the left & right of the data in each table cell never seems to be uniform across table columns.
The table is never the minimum width it could be.
The automatic table layout feature fixes these 3 problems: The algorithm works; white space on the left & the right of the data in each table cell is always exactly the same; & tables are always as narrow as they can be.
There is another upside to this layout mode: Version 2.0 tables use a larger font size.
In 1.0, I adopted this standard: The maximum width of all tables & images would be 800 pixels. Since the minimum display resolution of almost all displays in use today is 800 X 600 (See W3Schools' Browser Stats page for details), this standard would obviate the need for horizontal scroll bars.
The widest table on this site is Table 1A, Western European (EU15) Nations - Government & Politics - General, in the Europe Sucks - Dystopia essay. Sadly, using a 10-point font size -- the standard in 1.0 for all other text -- table width is 864 pixels. Using a 9-point font size, table width decreases to 765 pixels.
Thus, in 1.0, all tables used a 9-point font size to eliminate the possibility of table widths exceeding 800 pixels.
Not so in 2.0. Using the functional equivalent of a 10-point font size (See the Final Design For 2.0 section, Font size sub-section for further detail), table width for Table 1A is 722 pixels.
This significant reduction in table width is due to another upside of the automatic table layout feature: CSS styles can add the requisite white space (ie, padding) in table cells.
In 1.0, I padded cell contents using the non-breaking space entity. For example, the contents of the following table cell are right-aligned, & need padding on the right to prevent the cell's contents butting up against the cell's right border:
<td class="ar_bnsnn">5.0 </td>
I could have used the HTML cellpadding attribute, but this padding is applied to all 4 edges of the cell (ie, top, bottom, left, & right). Not good. I wanted padding only on the right. Thus, I had to insert an into every left-aligned & right-aligned table cell. Needless to say, doing so was onerous.
Along comes 2.0, & this simple redefinition of the HTML <td> tag:
td {
padding-left: 3px;
padding-right: 3px;
white-space: nowrap;}
Voila. The HTML markup becomes:
<td class="ar_bnsnn">5.0</td>
A much-simpler syntax results in narrower tables, an orgasmic twofer.
I've created 3 separate documents that example the different table methodologies. In each document, the HTML or XHTML markup displays after the table itself displays. These documents are:
Table Layout Property Comparisons
The 3 alternative methods of setting column widths are given: Using the fixed-table-layout width attribute, using the fixed-table-layout <col> & <colgroup> tags, & using an automatic table layout.
1.0's table is 447 pixels wide, 2.0's, 406 pixels. Plus, 2.0's table looks better.
Gurus speaking forcefully & in unison is not unusual.
From the 27 Jul 04 Stopdesign Look Ma, No Tables! article:
Those who were at Digital Design World in Seattle this year saw me present a session titled, “No More Tables, CSS Layout Techniques”. Yadda.
From Jim Ramsey's 27 October 2004 Digital Web Magazine Making News with Web Standards article:
The San Francisco Examiner recently became one of the first newspapers in the country to fully adopt Web standards & publish a site that utilizes valid XHTML & uses no tables for layout. Yadda.
‘Layout’ refers to partitioning a Web page into various areas; eg, horizontal header, body, & footer areas, with the body sub-partitioned into multiple vertical columns.
Basically, using tables for layout is analogous to using the very, very bad deprecated <font> tag. After all, the layout of a page is not content but presentation. Thus, using tables for layout violates the inviolable rule: Use CSS for presentation, & HTML/XHTML only for content.
Echoing the above theme -- & going beyond “Best Practices” to what Max Design & others call “Web Standards” -- is Russ Weakley's 13 Aug 04 A web standards checklist article:
The term web standards can mean different things to different people. For some, it is ‘table-free sites’, for others it is ‘using valid code’. However, web standards are much broader than that. A site built to web standards should adhere to standards (HTML, XHTML, XML, CSS, XSLT, DOM, MathML, SVG etc) and pursue best practices (valid code, accessible code, semantically correct code, user-friendly URLs etc).
In other words, a site built to web standards should ideally be lean, clean, CSS-based, accessible, usable and search engine friendly.
Weakley's checklist is useful in & of itself, & worth perusing & considering. Not embracing in toto, mind you; just worth considering.
CSS methods for Web page layout involve using either the CSS absolute positioning or float properties. 2 articles discuss these methods:
glish.com's CSS Layout Techniques: for Fun and Profit Web page discusses the float method in depth, & furnishes creme de la creme code snippets.
MIS Web Design's CSS Positioning Properties Web page discusses both methods, & provides links to other sites where these methods are further explained.
In version 1.0, only 2 documents used tables for layouts: Letters to the Editor and Poems. The reason so few documents used tables for layout requires a discussion of general Web page layout schemes.
Most Web pages are laid out in columns: 2, 3, or 4, with 3 columns seeming to be the most used. glish.com gives a good example of a 3-column layout on its Layout Techniques: 3 columns, the holy grail Web page.
“Holy grail”? Well, as we shall see, I do not quaff from this chalice.
In the 3-column scheme, the body is partitioned into multiple vertical columns. The leftmost column is used for navigation, the center column for text, & the rightmost column for ads, navigation, etc. The columns are aligned to the left, with the leftmost column positioned at pixel 1. Each column is given a fixed width in pixels. The 1st 2 columns are normally less than 800 pixels wide, with the 3rd column sometimes extending beyond 800 pixels.
Real good scheme for heavy-duty Web sites; eg, the Wall Street Journal's OpinionJournal.
In contrast to this 3-column left-aligned scheme, I use a 1-column centered scheme. I eschew navigation on the left &, instead, use navigation at the top. Hence, the entire width of the screen is the body.
In this scheme, everything is basically centered, with even margins or equal white space on the left & right:
The <body> functionally has 0.3” left & right margins.
Left-aligned text has an addt'l 0.3” left margin to allow negative indentation of headers (ie, <h1>, <h2>, <h3>). Thus, headers stand out. Still, if headers are considered text, then text has even margins on the left & right, or equal white space on the left & right.
Large images are usually centered (smaller images may be indented), with equal white space on the left & right.
Tables are centered, with equal white space on the left & right.
Forms are centered, with equal white space on the left & right.
Using Web terminology, this layout is ‘liquid’; ie, if the user resizes the screen, the browser readjusts everything nicely: The length of text lines readjusts to fit the window size, & horizontal scroll bars appear only if the width of tables or images is too large to fit in this window width.
But what if I need 2 or more columns? Enter float. For my purposes, float works better than absolute positioning for CSS layouts.
Below are the float-related styles I added to the master Web.css style sheet:
div.FloatContainer {
width: 700px;
margin-left: auto;
margin-right: auto;}
div.FloatLeft {
float: left;
width: 300px;
padding-left: 25px;
padding-right: 25px;}
div.FloatRight {
float: right;
width: 300px;
padding-left: 25px;
padding-right: 25px;}
div.Separator {
clear: both;}
The auto setting for FloatContainer's margin-left and margin-right properties centers this 700-pixel-width <div> (ie, float container) on the screen. These pixels are apportioned to 2 350-pixel-width sub-containers, one floated (or aligned) left & the other right, effectively creating 2 columns.
An example is in order. As with the already-referenced CSS Inheritance Example, Sample Table - Version 1.0, Sample Table - Version 2.0, and Table Layout Property Comparisons documents, I've created a separate document that examples the use of floats. As in these 4 documents, the CSS 2-Column Centered Float Layout document displays the XHTML markup afterwards.
For my purposes, the utility of these styles is maximal: They enable a 2-column centered layout, & do everything using tables for layout did.
In my opinion, the best “Best Practice” is this: Adopt & use standards. If practiced, other “Best Practices” ensue.
CSS, yes; HTML, no.
Like the qwerty keyboard, HTML was originally conceived in 1990 by basically one guy whose every incidental thought became an icon. Needless to say, we have no choice but HTML.
Fast forward to the browser wars of the 1990s between Netscape & Microsoft. Web developers were forced to create different markup for each browser since each browser vendor used proprietary HTML tags, or rendered the exact same HTML differently.
Thus, HTML was created in isolation, & evolved via proprietary accretion. Not good.
In October 1994, the industry founded the W3C to impose order on this chaos. From its About page:
W3C's mission is:
To lead the World Wide Web to its full potential by developing protocols and guidelines that ensure long-term growth for the Web.
W3C primarily pursues its mission through the creation of Web standards and guidelines.
The W3C succeeded. Its December 1999 HTML 4.01 Specification formalized HTML. The Web community willingly embraced this spec since it promised an end to browser-proprietary dialects of HTML. Grizzled, crippled veterans of the Netscape/Microsoft wars jumped for joy at this happenstance -- that is, those who were still ambulatory.
However, the HTML 4.01 spec was HTML's epitaph. In the spec, the W3C states:
For information about the next generation of HTML, “The Extensible HyperText Markup Language” (or XHTML) …
In January 2000, the W3C published its XHTML™ 1.0 Recommendation, intending XHTML as HTML's replacement. Since then, XHTML has evolved (ie, the May 2001 XHTML™ 1.1 Recommendation, & the July 2004 XHTML 2.0 Working Draft), but HTML hasn't (ie, 4.01 is still the latest spec published for HTML).
Considering this history, XHTML seems the only choice.
HTMLSource elucidates the benefits of using XHTML in its XHTML Explained article:
The benefits of adopting XHTML now or migrating your existing site to the new standards are many. First of all, they ensure excellent forward-compatibility for your creations. XHTML is the new set of standards that the web will be built on in the years to come, so future-proofing your work early will save you much trouble later on. Future browser versions might stop supporting deprecated elements from old HTML drafts, & so many old basic-HTML sites may start displaying incorrectly & unpredictably.
Once you have used XHTML for a short time, it is no more difficult to use than HTML ever was, & in ways is easier since it is built on a more simplified set of standards. Writing code is a more streamlined experience, as gone are the days of browser hacks & display tricks. Editing your existing code is also a nicer experience as it is infinitely cleaner & more self-explanatory. Browsers can also interpret & display a clean XHTML page quicker than one with errors that the browser may have to handle.
A well-written XHTML page is more accessible than an old style HTML page, & is guaranteed to work in any standards-compliant browser (which the latest round have finally become) due to the insistence on rules & sticking to accepted W3C specifications. As mentioned above, XHTML allows greater access to configurations other than a computer & browser. This interoperability is another aspect of XHTML's greater accessibility.
Convinced me. How about you?
For a short history of HTML & W3C's role in its evolution, please see HTMLSource's The History of HTML article.
If you'd like to read the W3C's short pitch for XHTML, please see the What is XHTML? section of the W3C's XHTML™ 1.0 Recommendation document.
Let's determine whether or not the 10 biggies have embraced XHTML. Let's see if they consider XHTML an imperative. Doing so requires introducing technical topics that are more fully detailed in the Final Design For 2.0 section.
How do browsers know which markup language (ie, HTML or XHTML) & which version of those markup languages Web pages use? The HTML Document Type Definition (DTD) or DOCTYPE tag so informs them; eg:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
Let's determine which DOCTYPE's the 10 biggies use.
1) Yahoo - #1
HTML 4.01 Transitional
2) Google - #3
None
3) ebay - #5
None
4) Microsoft - #6
HTML 4.0 Transitional
5) Amazon - #7
None
6) BBC - #11
HTML 4.01 Transitional
7) CNN - #13
HTML 4.01 Transitional
8) MapQuest - #30
XHTML 1.0 Transitional
9) Wikipedia - #37
XHTML 1.0 Transitional
10) New York Times - #59
HTML 4.0 Transitional
The summary of DOCTYPE's used is:
None: Google, eBay, Amazon
HTML 4.0 Transitional: Microsoft, New York Times
HTML 4.01 Transitional: Yahoo, the BBC, CNN
XHTML 1.0 Transitional: MapQuest, Wikipedia
The technical details of DOCTYPE's are discussed below in the Final Design For 2.0 section, DOCTYPE's sub-section. For now, let's explain the basics.
There are 2 markup languages: HTML & XHTML. For both markup languages, there are 2 layout modes: Strict & Transitional.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
The technical details of layout modes are discussed below in the Final Design For 2.0 section, Layout modes sub-section. For now, suffice it to say that “XHTML Transitional” is not pure XHTML, & “HTML Transitional” is not pure HTML.
Here's the important point: Browsers are theoretically impelled to strictly comply with the XHTML 1.0 spec only if the layout mode is defined to be -- well -- Strict. If the layout mode is Transitional, browsers vendors are not so constrained. In essence, ‘Transitional’ as in “Transitional layout mode” is akin to the use of ‘transitional’ in this sentence: “Near the end of his life, Ted Bundy was transitioning from being a serial rapist & murder to becoming a male Mother Theresa”; ie, the word ‘Transitional’ is hollow.
In actuality, markup defined as HTML 4.01 Strict can be more rigorous than markup defined as XHTML 1.0 Transitional; ie, the layout mode rather than the markup language is the guarantee.
The reason ‘theoretically’ is bolded will be explained a little later.
Before continuing, let's summarize the results from the 10 biggies:
1) Uses pure dimensionless fonts:
Yahoo, Microsoft, the BBC, & MapQuest do.
For all practical purposes, eBay & Wikipedia do.
Google, Amazon, CNN, & the New York Times do not.
2) Uses the deprecated <font> tag:
Microsoft, CNN, MapQuest, & Wikipedia do not.
The rest do.
3) Defines styles only in external style sheets:
Microsoft & MapQuest do.
The rest do not.
4) DOCTYPE's used:
None: Google, eBay, Amazon
HTML 4.0 Transitional: Microsoft, New York Times
HTML 4.01 Transitional: Yahoo, the BBC, CNN
XHTML 1.0 Transitional: MapQuest, Wikipedia
Amazingly:
The use of dimensionless fonts is more prevalent (ie, 6 sites) than the non-use of the <font> tag (4 sites);
The non-use of the <font> tag (4 sites) is more prevalent than the use of external style sheets (2 sites);
The use of external style sheets (2 sites) is as prevalent as the use of XHTML (2 sites); &
The use of XHTML (2 sites) is more prevalent than the use of the Strict layout mode (0 sites).
Surprisingly, none of the 10 biggies uses the strict layout mode, & only 2 use XHTML. Have I said this is a surpising result? Anyway, there are 3 possible reasons:
1) All 10 sites may still be using Legacy code that would be a bitch to upgrade to XHTML.
Old code never dies. If it fades away, it fades away imperceptibly.
If there is an element of HTML that should die, it's <font>, but 6 sites still use it. From this fact alone we can safely conclude that legacy code litters the sites of the 10 biggies.
Old coding habits may not die, either. W3Schools is perhaps the leading source for tutorials on anything & everything Web-related. On their Introduction to XHTML page, they proudly proclaim:
W3Schools was completely rewritten to XHTML 1.0 in 1999.
However, on their Introduction to XHTML page, W3Schools uses tables & not style sheets for layout. As was discussed above in the Tables for layout: No! section, doing so is old school.
Intriguingly, W3Schools uses the Transitional & not the Strict layout mode. W3Schools has gravitas. Ergo, their non-use of the Strict layout mode is noteworthy. As with ‘theoretically’, more soon.
2) Upgrading to XHTML might cause older browsers to crater.
The below excerpt from D-Zine's 2003 XHTML article is pertinent here:
One downside (to XHTML) is that some older browsers may struggle to read XHTML as it is a new language compared to HTML. This is a risk to accept with any new technology. If your website is attracting users with outdated software, it probably might not be worth switching to XHTML right away. The fact is, you can continue using HTML 4.01 until you have grandchildren, as HTML 4.01 is a W3C Standard that will remain over the years to come.
I digress. Wrong. HTML will not live forever. If MS-DOS finally died in Windows XP, HTML will die its Who cares? death, too. HTML is moribund. The W3C wrote its epitaph more than 6 years ago.
However, the article is correct in its assessment of older browsers. It seems the 10 biggies still accommodate users with very old browsers. These older browsers required HTML tailored to their functionalities; ie, legacy code that violates W3C standards. Thus, all 10 biggies may still incorporate this legacy code to enable access to their sites by the 3 or 7 individuals worldwide who still use older versions of the Microsoft or Netscape browsers.
As will be detailed in the Final Design For 2.0 section, Browser Rendering sub-section, these older browsers are dying. Since they are, we should be able to disregard them & presume that all widely-used browsers support standards-compliant HTML & XHTML
But, should we so presume?
3) The latest version of the Internet Explorer browser is not standards-compliant.
Unfortunately, our presumption is mistaken. Even though the HTML 4.01 Specification & the XHTML 1.0 Recommendation theoretically impel IE6 to support standards-compliant HTML & XHTML, IE6 does not.
IE6 is the elephant in grandma's china cabinet. From a strict technical standpoint, IE6 is not standards-compliant. In fact, legions of gurus insist that we not use XHTML, & strongly suggest that we not use the Strict layout mode entirely due to IE6's non-compliance with XHTML. That only 2 of the biggies use XHTML, & that neither W3Schools nor any of the 10 biggies use the Strict layout mode seems to affirm this POV.
Why are gurus so insistent? Well, this question is premature. Before answering it, we need the technical foundation laid below in the Final Design For 2.0 section, DOCTYPE's, Layout modes, & Browser rendering sub-sections before we can do this topic justice. Please see the following Final Design For 2.0 section, Utopian Marxists vs Supply Sergeants sub-section for detail.
For now, suffice it to say that, from a pragmatic standpoint, XHTML is salvageable. HTMLSource echoes this common-sense perspective in its Browser Upgrades article:
IE6 for Windows has decent support for many important standards, including the vital HTML 4.01, XHTML 1.0 and CSS-1, as well as good JavaScript support. It delivers a generally good all-round experience, though falls down on most of the advanced stuff that we webmasters would like to be able to use. It's also rather vulnerable to security threats. Firefox is a much better choice.
We non-Webmasters have a choice: We can earn our anti-IE6 Eagle Scout merit badge, or we can use XHTML. As evidence that the latter choice is the better one, link to the W3C Home Page, scroll to the bottom, & click this icon:
![]()
Their Home Page & all other W3C pages validate as XHTML 1.0 Strict. If the W3C does XHTML 1.0 Strict, so should we.
For all practical purposes, XHTML deprecates HTML, implying that, sometime in the future, HTML will go the way of the deprecated <font> tag; ie, it will be relocated to a museum of software technology that nobody visits.
In the final analysis, upgrading to XHTML 1.0 Strict seems a no-brainer.
As I type, 2.0 is final. I've upgraded the Home Page, & the Web Publishing, Writing, & Science sections to XHTML. Documents number about 50, total size about 1.4MB. The W3C HTML/XHTML Markup Validation Service validates these documents as XHTML 1.0 Strict, the W3C CSS Validation Service validates the Cascading Style Sheet these 2 documents use, & the W3C Link Checker validates all links. Thus, what follows is vetted.
There are 2 documents on this Web site not declared as XHTML 1.0. Also, one CSS validation error does occur, but this error is of no consequence. Please see the Standards section for more info.
This section discusses these topics:
2) Font size
7) DOCTYPE's
8) Layout modes
10) Utopian Marxists vs Supply Sergeants
11) HTML vs XHTML
12) HTML Tidy
13) W3C validation
14) Macromedia HomeSite editor
My final standard turned out to be:
Use inches for left & right margins;
Use em's for top & bottom margins; &
Use pixels for font sizes.
Sure, inches & pixels are fixed units of measure, but the target audience for my Web site is not the world: Few if any handicapped will create a personal Web site based upon my tutorials; few if any harried IT professionals will peruse my Web site on their PDAs; & few if any teenage party animals will access my Web site on their cell phones. Realistically, anyone who accesses my site will do so on a PC with an 800 X 600 or a 1024 X 768 display. Thus, going dimensionless is not a compelling interest.
But, would going dimensionless be difficult? No. After it builds the necessary foundation, The Accessibility issue section below has a link to an example document that uses dimensionless units of measure for margins & font sizes. If you care to access this Dimensionless Units of Measure document now, click the preceding URL.
95% of the work required to go dimensionless involves separating content from presentation; ie, defining all presentation elements in Cascading Style Sheets. Once this onerous & rigorous task is complete, the rest is easy.
Let's discuss pixels vs points in more depth.
A display that visually depicts the relationships among the various font size units of measure is Macrides Web Services' Font Size Settings.
Converting pixels to & from points is straightforward:
1 pixel = .75 points
1 point = 1.3333… pixels
However, the unending decimal causes a problem. Let's do a visual of these relationships.
Note that there is no point equivalent to pixel font sizes of 10, 14, & 18, etc. Thus, pixels allow a finer gradation between font sizes than points do.
Due to CSS inheritance, the default font size 2.0's Web.css sets is 13 pixels, or 10-point (strictly, 13 X .75 = 9.75-point):
.body {
margin-left: 0.0in;
margin-right: 0.0in;
font-family: Verdana, Helvetica, sans-serif;
font-size: 13px;
background: url(../gen/Background_Grid.png) repeat-y;
position: relative;}
The usual suspects override this default font size: Titles, headings, other fonts, etc.
However, there is a notable exception to this rule: HTML tables use a 12-pixel font size:
table {
margin-left: auto;
margin-right: auto;
table-layout: auto;
text-align: left;
font-size: 12px;}
The reason tables use this smaller font size is that tables display more compactly using a 12-pixel versus a 13-pixel font size. Specifically, tables use less white space, although the sizes of the characters themselves are identical.
An example is in order. Following are 2 displays of a sample 2 X 4 table, each magnified 4 times. The 1st uses a 12-pixel font size, & the 2nd 13 pixels:
The non-magnified sizes of the 2 images are 83 X 58 pixels and 85 X 66 pixels, respectively. Thus, a 12-pixel font size ‘saves’ 2 pixels horizontally & 8 pixels vertically, or 1 by 2 pixels per table cell.
The Accessibility issue section below further discusses font sizes.
Note: ‘Family’ is the term CSS uses, while ‘face’ is an HTML term. Since the <font> tag is deprecated, so is ‘face’. Thus, the correct phraseology is, “In yo' family”.
A display that shows a visual comparison among the most popular font faces is Maratz.com's Table of most-used web-fonts on Windows.
This display does not include all the fonts the 10 biggies use. Ergo, using Maratz.com's Web page as a template, I've created a display that does. To view it, please see Comparison of Popular Fonts.
Verdana seems the most readable font. As is evident in the display, it is the widest of all the Sans-Serif fonts. Interestingly, Verdana is the exact same font as Microsoft Sans-Serif, right down to the pixel.
Arial may be more popular, but the basis for its popularity seems to be the inherent narrowness of the font; ie, using the same font size, Verdana is 20% wider. Thus, the New York Times can stuff more words on its pages.
Not to beat the accessibility drum here, but Verdana seems a better choice for the visually impaired than Arial.
In any case, I chose Verdana as my font 6 years ago, & Verdana it will remain.
With Cascading Style Sheets, I adopted the purist approach:
1) All presentation elements are defined using styles;
2) No presentation elements are defined using HTML tags;
3) All styles are defined in external, linked style sheets; &
4) No styles are defined within an XHTML document.
There are exceptions to these rules, but these exceptions have legitimacy. Please see the Standards section for detail.
I use 3 Cascading Style Sheets. To view them, click the URL:
1) Web.css is the primary style sheet defining all major presentation elements.
2) Tables.css defines alignment, background, border, & font properties for HTML tables. At last count, Tables.css defined about 170 different styles. These styles would clutter Web.css.
3) Misc.css defines heterogeneous styles that, again, would clutter Web.css.
Time to exercise our rote memory. A few acronyms:
HTML: HyperText Markup Language
XHTML: Extensible HyperText Markup Language
SGML: Standard Generalized Markup Language
XML: Extensible Markup Language
The W3C spec for XHTML 1.0 has this title:
XHTML™ 1.0 The Extensible HyperText Markup Language (Second Edition)
A Reformulation of HTML 4 in XML 1.0
ComputerUser.com on Extensible:
Able to be extended or expanded. Extensible programming languages allow the programmer to customize: to add new functions & modify the behavior of existing functions.
Wikipedia on Markup Language:
A markup language combines text & extra information about the text. The extra information, for example about the text's structure or presentation, is expressed using markup, which is intermingled with the primary text. The best-known markup language in modern use is HTML (HyperText Markup Language).
Wikipedia on SGML:
The Standard Generalized Markup Language (SGML) is a metalanguage in which one can define markup languages for documents… HTML was originally designed based on SGML tagging but without SGML's emphasis on rigorous markup
Wikipedia on XML:
The Extensible Markup Language (XML) is a W3C-recommended general-purpose markup language for creating special-purpose markup languages… It is a simplified subset of SGML. Its primary purpose is to facilitate the sharing of data across different systems, particularly systems connected via the Internet. Languages based on XML (for example, … XHTML …) are themselves described in a formal way, allowing some programs to modify & validate documents in these languages without prior knowledge of their form.
Thus, just as SGML spawned HTML, XML (a general-purpose markup language) spawned XHTML (a special-purpose markup language):
XHTML = XML + HTML
To paraphrase, XHTML is nothing more than XML's rigor superimposed onto HTML.
Understanding the intricacies of XML is not a prerequisite for understanding XHTML. However, if you feel the need for more info, please see the W3School tutorial on XML.
Some interesting excerpts:
XML was designed to carry data.
XML is not a replacement for HTML. XML & HTML were designed with different goals:
XML was designed to describe data & to focus on what data is.
HTML was designed to display data & to focus on how data looks.
HTML is about displaying information, while XML is about describing information.
XML tags are not predefined. You must “invent” your own tags.
The tags used to mark up HTML documents & the structure of HTML documents are predefined. The author of HTML documents can only use tags that are defined in the HTML standard (like <p>, <h1>, etc.).
XML allows the author to define his own tags & his own document structure.
The above is academic abstraction, nice to know but not overl