Have you ever been working with the Telerik Quickstart examples and wondered why all of the demo pages inherit from XhtmlPage instead of System.Web.UI.Page? Even if you guess that this custom base page somehow ensures that all rendered demo pages are XHTML valid, have you ever wondered how the base page manages to deliver that functionality? In this post, we’ll take a close look at the XhtmlPage class and see what it’s doing to deliver easy XHTML validity. The lessons learned from this class can easily be applied to your own custom page classes and used to make your web sites as accessible as possible.
Before we look at how XhtmlPage delivers XHTML validity, it is important to understand why XHTML validity matters. And frankly, for many websites, XHTML does not matter. But for those sites where accessibility and standards compliance are paramount (such as government sites, large corporate sites, or nitpicky hobby sites), delivering XHTML compliance is not an option; it’s a requirement.
XHTML in the simplest sense is just well formed HTML. It exists as the convergence of HTML and XML and is essentially a strict subset of standard HTML markup. Most (if not all) modern browsers are very forgiving, silently rendering improperly tagged HTML (such as missing closing tags, missing attribute quotes, etc.). While this makes it easier to create web pages and reduces the chances the browser can’t render a page for a user, it does mean developers can unknowingly render mal-formed HTML to the browser. The end effect is that the rendering results of a page are not consistent from browser to browser depending on how the browser tries to “fix” your HTML.
In theory, XHTML should also require less processing power and time to render since it can be predictably parsed (like XML). In practice, though, few browsers deliver XHTML support that surpasses the HTML rendering capabilities- even on mobile platforms- so the benefits of XHTML are more theoretical than practical. Still, if you can make your pages XHTML compliant, it definitely ensures you’ll be rendering HTML that is compliant with WC3 standards.
At the heart of the XhtmlPage class is a single overridden method called RenderChildren. This method is responsible for rendering all of the page HTML on the server and is the perfect place to examine rendered HTML and modify it to our heart’s desire. In this case, we want to inspect the rendered HTML, look for any possible XHTML violations, and fix them before the rendered page is sent to the client.
This method begins by executing the standard System.Web.UI.Page RenderChildren method to obtain the page’s HTML output. Once in hand, our overridden method calls six additional methods that work on the HTML output to look for XHTML problems: FixEmptyTitleTag, FixAutoPostBackElements, RemoveScriptLanguageAttribute, FixFormNameAttribute, FixDoPostback, and FixViewState.
This first step in the XHTML compliance process uses regular expressions to find any <title> tags on the page that contain the word “Untitled” or have no inner content. If the RegEx parser finds a match, it removes the matched portion of the document. The result is a page that renders without a <title> tag if it is empty or untitled.
This fix is more for convenience than actual XHTML compliance. XHTML 1.1 guidance does require a page <head> tag to contain a <title> tag, so this check will actually break XHTML compliance if the <title> tag is empty. It will, however, prevent any pages from displaying the unsightly “Untitled Page” title. Compliance can easily be delivered if a simple title is added to the page.
The second step in the process looks for the “language” attribute in any HTML elements on the page that can be used to trigger auto postback events. If the RegEx parser finds any input or select elements on the page, it removes the language attribute and re-inserts them into the page’s output string. The language attribute is optional, and in valid XHTML the attribute should be “lang” instead of “language”. In this case, it is easier to remove the optional attribute than to try to fix it.
As with all RegEx operations where we are trying to remove blocks text, we use named backreferences (a regular expression concept) to capture portions of our matched text into accessible variables. By using a RegEx MatchEvaluator, we can evaluate our named references and reassemble a string without a specific match block, like this: RegEx Match definition (backreferences highlighted) Match evaluator putting "beforeLanguage" and "afterLangauage" sections together
The next step evaluates the page’s form tag and removes the “name” attribute if it exists. In XHTML, the “name” and “id” attributes (both of which exist in HTML) serve the same purpose of identifying an object on the page. XHTML 1.0 sought to simplify the processing of documents by requiring only the “id” attribute for identifying objects. It also formally deprecated the “name” attribute. To make our ASP.NET rendered form tag valid, we remove the deprecated “name” attribute and add the valid markup back to the building output string.
Finally, one of the more commonly acknowledged XHTML problems caused by ASP.NET is the rendering of the ViewState hidden field. The ViewState hidden input is rendered directly to page, not in a XHTML required containing element. Actually, the problem exists for all of the hidden fields rendered by ASP.NET, like __VIEWSTATE, __EVENTARGUMENT, __EVENTTARGET, __LASTFOCUS, and __EVENTVALIDATION. To fix the problem, the hidden elements must be placed inside of plain HTML DIV tags, which can easily be done using our RegEx evaluators.
Extending this solution
Clearly, the XhtmlPage solution delivered in the Telerik Quickstart demos does not address all possible XHTML problems on a page. For example, XHTML requires that all tags and attributes are written in lowercase letters and that all attribute values are quoted. It also requires that singleton controls (like <img /> and <br />) are properly closed. Depending on how much control you have over the content added to your page (CMS systems, for example, have unpredictable content), stealing processing power to do these extra compliance fixes may or may not be worth the effort.
The XhtmlPage solution aims to address the major XHTML compliance problems introduced at the ASP.NET framework level. Most of these fixes target the XHTML 1.1 document type definition (DTD), which means the fixes may or may not cover the validity of page that uses a different XHTML DTD.
Wrapping it up
Hopefully this brief look at the XhtmlPage class helps clear up the mystery of the demo’s custom base page and introduces you to the process of making your pages XHTML valid. Creating a custom base page for your web sites offers numerous benefits, not the least of which is providing fine control over the page’s rendering. With a little RegEx magic and careful planning, you can easily make your site XHTML valid and run it through the XML validators without a problem.
@toddanglin on Twitter