Follow Me On Twitter Facebook LinkedIn Flickr
Surprisingly I'm rather liking the Amazon elastic compute cloud. Running my first VM instance with my new pet Linux distro ubuntu 10.04 ... 3 weeks ago
A software development and computer technology blog.

Archive for December, 2004

VS.NET Code Horrification

There have been many online discussions about how Visual Studio messes up the formatting of HTML source code, I must admit I have been involved in a few of these, including Mikhail Arkhipov’s Weblog for instance. Which explains that the responsibility for code-mangling is down to MSHTML.DLL and its’ tokeniser and grammar analyser recreating the source HTML from the browser output (in visual studios’ design view) when switching to HTML view.

It appears that MSHTML.DLL is not fully XHTML compliant (not that I am suggesting someone said it was), consider the following case:

<UL>
  <LI>test1</LI>
  <LI>test2</LI>
  <LI>test3</LI>
</UL>

Now, I would expect the indentation to be stripped along with the odd CR/LF only to have extra CR/LF inserted elsewhere. This is what I normally get so there would be no surprises there. But the removal of closing tags? The extra whitespace raised an eyebrow too…

The new source as created by VS.NET after switching to design view and back again:

<UL>
  <LI>test1 
  <LI>test2 
  <LI>test3 </LI></UL>

Does this not break one of the XHTML commandments?

So, as a word of warning, don’t expect VS.NET projects to get you a big thumbs up from any XHTML validators. I guess I’ll just have to wait anxiously (but not without shame I hasten to add) for the arrival of Whidbey (VS 2005), which claims to give you the option to leave your code untouched, well almost. Atleast the solution has been to only update the original HTML with changes to that code from MSHTML.

Quote: “ In Whidbey we never directly use the HTML that MSHTML outputs. Instead, we transfer changes from it to you document. Therefore, if you only changed one attribute, only its new value will be applied to the original file, everything will be left alone.

But how significant must a “change” be before it’s considered necessary to propagate back to the users code? I would like to assume that Whidbey was XHTML compliant and doesn’t remove closing tags, but I did assume for some time that Visual Studio 2003 was. From Mikhails’ article it appears that formatting as the user intended will prevail, but apart from that Whidbey will assume command surely?

Maybe I should crack open a beta or two…

Optimised XSLT for speedier PHP5 DOM results

I thought I’d post this follow-up to Move over Sablotron, here comes PHP5, since I have finally managed to create a sensible stylesheet that works instantly through PHP5′s DOM functions.

Ok, so I’ll admit my approach so far has been monstrously inefficient, but I’ve just noticed that the tree I was barking up is in fact empty and a quick net search has had me clawing at fresh bark with results that finally satisfy my requirements. Up until now I’ve been creating nested loop values for the periodic tables’ row and column values, searching for the matching atom node and displaying it, or an empty cell if no atom was found. For 9 rows with 18 columns, that’s 162 searches per page load! Which is a little over the top considering that the XML data is already ordered by row and column anyway.

A more logical approach would be to traverse the XML tree from start to finish, and display each node at the appropriate point in the resulting table. This method would be much faster, rather than searching the whole file for each atom node, simply look for where the first atom should be placed, followed by the second, and so on. One must be certain that the XML data is in the correct order before doing so, otherwise some nodes will be missed. Since I’m generating the XML from a MySQL database using Perl then this is easily done.

First of all I needed the main template with a way of recursively calling a template to process each row of data. This gives the first part of my stylesheet as:

XSL Header and main Template

<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

<xsl:template match="periodic_table">
<html>
  <head>
    <title>Periodic Table Of Elements</title>
    <link rel="stylesheet" type="text/css" href="./pt.css"/>  
  </head>
  <body>
    <table cellspacing="1" cellpadding="2" border="0">
      <xsl:call-template name="proc_row">
        <xsl:with-param name="trow">1</xsl:with-param>
      </xsl:call-template>
    </table>
  </body>
</html>
</xsl:template>

My next step was to generate the code to traverse all the atom nodes from start to finish. This can be done simply with the XSL command, but since we are processing one row at a time we need to cut down the amount of data by specifying search criteria in the select attribute of the for-each command. This loop will then call the template to process each column in the current row. Now this is where I encountered my major problem. Each time I called this template I was starting the processing from column 1, then if the current nodes’ column did not match this column it would display an empty table cell and recursively call itself with the next column value (in this case 2) until it reached the end of the row. If the column parameter being passed to this template matched the current node, then information about the atom in that node would be displayed and it would drop out of the column recursion so it could process the next atom element in the data. This didn’t quite work since the processing of each atom on a row would restart at column 1 and even though all columns before the current atom will already have been processed, they were being added again as blank cells

The problem with processing from column 1 for every atom in a row:

[COL1] [COL2] [COL3] [COL4] [COL5] [COL6] [COL7] [COL8] [COL9] [COL10]
  K    Empty    Ca   Empty  Empty    Sc   Empty  Empty  Empty    Ti
col=1  col=1  col=2  col=1  col=2  col=3  col=1  col=2  col=3  col=4

I kept wishing there was some simple way to start each columns’ processing after the last column, so it didn’t generate all empty columns from 1 for each atom, but would still fill gaps between the last atom and the current one, such as the 16 empty cells between Hydrogen (H) and Helium (He). Finally I stumbled upon a way I could start processing from the column after the last one processed, when I tried a search on google for ‘xsl preceding element’ and found a whole new set of XSL functionality in the form of preceding-sibling and following-sibling.

The preceding-sibling::node references all previous sibling nodes before the current one, since I am limiting the data to single rows, for me this will give all preceding sibling nodes on the current row. By specifying the number of the preceding sibling you can pinpoint the neighbouring sibling before the current node with preceding-sibling::node[1]. Using this method I could then jump straight to the column after the last atom displayed and then decide if the next atom should be displayed or if an empty cell is required. To do this I needed the stylesheet to enforce the following rules within the loop:

  • If the preceding atom was on the row before this one then start processing the current atom from column 1.
  • If the preceding atom was on the same row and this is not column 1 then start processing the current atom from the column after the last atom.
  • If neither of the above rules apply (i.e. we are currently processing the first atom of row 1) then start processing the current atom from column 1.

Here is the proc_row template which includes the loop and the rules defined above:

XSL Template: proc_row

<xsl:template name="proc_row">
  <xsl:param name="trow"/>
  <xsl:choose>
    <xsl:when test="$trow &lt; 10">
      <tr>
        <xsl:for-each select="atom[row=$trow]">
          <xsl:choose>
            <xsl:when test="preceding-sibling::atom[1]/row &lt; row">
              <xsl:call-template name="proc_col">
                <xsl:with-param name="tcol" select="1"/>
              </xsl:call-template>
            </xsl:when>
            <xsl:otherwise>
              <xsl:choose>
                <xsl:when test="col=1">
                  <xsl:call-template name="proc_col">
                    <xsl:with-param name="tcol" select="1"/>
                  </xsl:call-template>
                </xsl:when>
                <xsl:otherwise>
                  <xsl:call-template name="proc_col">
                    <xsl:with-param name="tcol" select="1+preceding-sibling::atom[1]/col"></xsl:with-param>
                  </xsl:call-template>
                </xsl:otherwise>
              </xsl:choose>
            </xsl:otherwise>
          </xsl:choose>
        </xsl:for-each>
      </tr>
      <xsl:call-template name="proc_row">
        <xsl:with-param name="trow" select="$trow+1"/>
      </xsl:call-template>
    </xsl:when>
  </xsl:choose>
</xsl:template>

Finally the proc_col template which will display the atom details if the current column equals that of the current atom and return to the proc_row template to receive details of the next atom, or will display an empty cell and recall itself for the next column.

XSL Template: proc_col

<xsl:template name="proc_col">
  <xsl:param name="tcol"/>
  <xsl:choose>
    <xsl:when test="col=$tcol">
      <td class="elementCell">
        <xsl:attribute name="title">
          <xsl:value-of select="@name"/>
        </xsl:attribute>
        <div class="elementSymbol">
          <a>
            <xsl:attribute name="href">
              ./elementinfo.php?element=<xsl:value-of select="@symbol"/>
            </xsl:attribute>
            <xsl:value-of select="@symbol"/>
          </a>
        </div>
        <div class="elementAtomicNumber"><xsl:value-of select="atomic_number"/></div>
      </td>
    </xsl:when>
    <xsl:when test="$tcol &lt; col">
      <td>
      </td>
      <xsl:choose>
        <xsl:when test="$tcol &lt; 18">
          <xsl:call-template name="proc_col">
            <xsl:with-param name="tcol" select="$tcol+1"/>
          </xsl:call-template>
        </xsl:when>
      </xsl:choose>
    </xsl:when>
    <xsl:otherwise>
    </xsl:otherwise>
  </xsl:choose>
</xsl:template>

Applying this stylesheet to my periodic table XML data using PHP5 DOM produces the output instantly, without any indication that this isn’t a static HTML page. I guess I should have a good read through the XSLT 2.0 Programmers Reference by Michael Kay which we have at work amongst all the other XML/XSLT books we have.

Then I could say ‘XSLT’?…it’s easy, Mmm’Kay.