Internet Evolution: Introducing XML

When I started writing web pages back in ’95, HTML seemed enough, but as time passed and languages and standards evolved the results made a whole lot more sense. In the beginning SGML (Standard Generalised Markup Language) was the root language so to speak. This was used to define the HTML (HyperText Markup Language) language, amongst others.

Over the years HTML has been misused in many ways, although achieving the designers desired results, this has been at the expense of accessibility. I’m not just referring to accessibility for those with disability, but also those using different platforms and technologies. Consider the differences between Internet Explorer and Netscape Navigator, numerous headaches have been inflicted by their separate approaches. On the side of disability the over use of tables to provide page layout has alienated those who rely on screen-readers and similar tools. Luckily things are beginning to change and many other benefits have been introduced.

The first improvement I came across was the introduction of CSS (Cascading Style Sheets) and layers using the DIV and SPAN tags. The point of CSS was to enable and encourage, designers to separate a web pages layout from its content. Some of the benefits of this include re-useability of layouts and styles over whole websites requiring a theme throughout and by removing the style information from HTML documents, increasing the readability of the HTML source. The DIV and SPAN tags enabled us to create layouts which had previously required the use of tables, since these tags work very closely with CSS as opposed to HTML, this removes the clutter of tables from HTML documents. Although this is not without a price, since the construction of tabled layouts are more straightforward, whereas the CSS path is a rocky road with many pitfalls. But hey, wheres the fun in a smooth ride?

XML (eXtensible Markup Language) has since taken us to the next level by removing the content of web pages from the HTML document. At this point you might be wondering, “what could be remaining in the HTML document after removing both layout and content?”. The answer to which is the “document structure”. With the three elements of a document separated we achieve greater control and functionality, since we can update the appearance of a site without dealing with the content or structure, we can change the content without wading through structure and appearance and so on. XML, like HTML, is tag based and derived from SGML. But unlike HTML, XML is self-descriptive in that there are no predefined tags (except for the single xml tag found at the top of all xml files), you must define your own.

XML, as I’ve mentioned before, is designed for describing data (such as web page content, for example) and because of its self-descriptive nature you are left to freely define your data however you wish. Consider this example XML for describing a music collection:

<?xml version="1.0"?>
    <title>One Of These Nights</title>
    <format type="LP">Vinyl</format>
    <title>Master Of Puppets</title>
    <format type="Album">CD</format>

The only pre-defined tag in the above code is the required xml tag, the rest have no meaning except to the person who wrote the code and maybe others if the code is self explanatory such as in this example. So, if the tags have no real meaning to something like a web browser then how can they be of any use you might wonder? Well, XML on it’s own does nothing except define your data, you have to use some other means to actually use it. For example a compiled program could read, interpret and use the data, or it can be directly displayed in a web browser using XSL (eXtensible Stylesheet Language). This language consists of several sub parts, each of which plays an important role in the use of XML stored data:

  • XSLT (XSL-Transform) – Transforms XML documents into other XML, HTML or other formatted documents.
  • XPath – Defines sections of an XML document.
  • XSL-FO (XSL-Formatting Objects) – Formats XML documents.
  • XQuery – Built on XPath expressions, this language is used to query XML data, much like SQL queries database data.

This evolution from a single language providing static documents (without scripting), often providing inaccessible information with messy source code to isolation of document structure, content and style with dynamic potential is clearly a jump away from the darkside.

May the markup be with you…