I know it probably sounds like I’m whining about .NET at every opportunity, but for some reason it’s something I just love to hate. So naturally I just have to highlight every occassion it falls down and this is one of those occassions.
Passing certain characters around, for example: between webpages, can cause issues when certain characters are used for special cases, such as spaces, ? and & in URLs. Trying to pass the string “You & I?” in a URL will cause problems, since ? tells the browser where the webpage address ends and passed parameters start, the ampersand (&) delimits the paramters and spaces are just generally disliked as are all other whitespace characters (characters that are not actually visible on the screen, but may well affect the position of other characters).
To combat this URLs can be encoded, where each special character is replaced by a standard(ish) encoding string of characters. This string consists of the percentage symbol (%) followed by a 2-digit hexadecimal ASCII code of the special character. Therefore since a space has an ASCII value of 32 (decimal) then the encoded version of a space is ‘%20′. This has neither any special characters or whitespace, except of course that by enforcing that any occurrance of a percentage symbol denotes that the following 2 digits signify a characters hex code, we are then turning the percentage symbol into a special character, but now it’s the only one and now denoted as “%25″. So, our string from earlier, “You & I?” now becomes “You%20%26%20I%3F” in its encoded form.
When scripting webpages to manipulate and use such encoded data, Javascript provides two very useful functions escape() and unescape(). escape() takes a string and returns the encoded version of the string, whereas unescape() does the opposite. So you could convert a string to its encoded version, pass it as a parameter to another webpage where it is converted back to its’ original form.
This is also useful for maintaining the exact content of say a multi-lined textbox, which is like a mini-text editor. Encoding in this manner would preserve all the characters we’ve mentioned but including other whitespace characters like tabs, carriage returns and line feeds or just newlines.
So what has this got to do with VB.NET exactly? Well, sadly very little it seems……what I mean is VB.NET appears to *almost* completely ignore this method of encoding. To begin with I foolishly assumed that VB.NET would have similar escape() and unescape() functions available that mimic those of Javascript, but they don’t exist or atleast I haven’t found them.
I had been using the System.Web.HttpUtility.HtmlEncode and HtmlDecode for some projects, but this converts things like “&” into the html friendly code “&”, which is perfectly acceptable to me. Then I spotted the UrlEncode and UrlDecode functions also under HttpUtility. But these converted most characters correctly apart from the space character, which it encoded to “+”!? A valid URL representation I’ll admit, but I would consider “%20″ to be more standard. Encode each character equally I say, not all except one, making the data incompatible with the Javascript coding functions.
I suppose we could stop there really, using UrlEncode and UrlDecode and additionally replacing “+” and spaces where required, but I’m a hard programmer to please and I can’t see any justification for encoding twice as much as necessary, it makes no sense.
Eventually, after a significant web trawl I found a method that works, not entirely as I would have hoped since it looks like more of a hack than the norm. The idea is to use a reference to JScript to extend VBs capabilities a little. The string should be UrlDecoded first and then unescaped using JScripts’ GlobalObject.
Dim myString as string = "You%20%26%20I%3F"
myString = Server.UrlDecode(myString)
myString = Microsoft.JScript.GlobalObject.unescape(myString)
…Should leave myString with a value of “You & I?”
Well, that’s todays rant over with…
