Tuesday, January 21, 2014

Invalid XML characters

Recently I investigated an issue with one of our WCF based services where the client’s XML parser (Java SAX) complained of invalid XML characters.
org.xml.sax.SAXParseException; lineNumber: 6; columnNumber: 24; Character reference "&#
    at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
    at org.xml.sax.helpers.XMLFilterImpl.parse(Unknown Source)
    at org.apache.xalan.transformer.TransformerIdentityImpl.transform(TransformerIdentityImpl.java:485)

Although the character that caused the exception was not shown here, one of my colleagues used the WCFTestClient to get at the actual SOAP message (see below) and it showed some interesting looking characters like  and   in the message field.

<s:Envelope xmlns:s="http://www.w3.org/2003/05/soap-envelope" xmlns:a="http://www.w3.org/2005/08/addressing">
    <a:Action s:mustUnderstand="1">.../GetErrorsResponse</a:Action>
    <GetErrorsResponse xmlns="AppSecInc.Checks.Service">
      <GetErrorsResult xmlns:b="http://schemas.datacontract.org/2004/07/.." xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
        <b:Error i:type="b:ScanError">
          <b:Message>..: Exception Information
           Exception type=WeOnlyDo.Exceptions.SSH.TimeoutException
           Message=Timeout occurred due to inactivity.
           StackTrace=   at &#x6; .&#x2;(String &#x2;, String[]&amp; &#x3;, Int32 &#x5;, Int32&amp; &#x8;, String&amp; &#x6;)
           at WeOnlyDo.Client.SSH.Execute(String Command, String Prompt, Int16 Timeout)
Thanks to a post on stackexchange these did, in fact, turn out to be invalid XML characters. As per the Characters section of XML spec from W3C  only these characters are valid
#x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]    /* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */

At least, by default, WCF does not seem to do anything special about them. I do not know if anything could be done at the WCF level, but I want to explore if an extension to WCF can be implemented that will filter such invalid characters.

No comments:

Post a Comment