XML for Secured Message Passing

Securely passing XML Documents between two parties requires that the receiver can find out:

  • who sent the message
  • whether the message has been tampered with

and the sender can guarantee that:

  • enough information is available so the receiver can verify who the sender is
  • the receiver can find out whether the message has been tampered with

These requirements can be satisfied if the document is digitally signed. However, using XML as a message format brings additional complications that must be handled. Consider the following two documents:

They represent the same information but in different way:

  • the attribute value is double quoted in the first document while it is single quoted in the second one
  • the empty element uses different syntax

This is a simple example. Think about all the other things that can differ. For example:

  • the order of the attributes
  • the document encoding
  • arbitrary whitespaces
  • new lines
  • how new lines are encoded

All these differences will cause documents having the same data result in different signatures.


XML Canonicalization to the Rescue

To avoid these situations, the XML document must be canonicalized so it is always represented in the same way no matter what the original document structure is. The canonicalization is done by following the rules defined in https://www.w3.org/TR/xml-c14n/. The following list is a sample of the rules just to show what they look like:

  • XML declaration is omitted
  • All characters are encoded to UTF-8
  • Empty element must be converted to start-end tag pair
  • Attribute values use double quotes
  • CDATA sections are replaced by its content only


To demonstrate XML canonicalization the Apache Santuario library will be used.

The following code uses the Canonicalizer class to convert the XML documents to have the same structure.

As mentioned early this is a simple example. The XML document can be much richer containing comments, namespaces, entity references etc. The canonicalization of these cannot be handled by just one mapping. That’s why there are different algorithms that can be specified when a Canonicalizer is instantiated. You can see the full documentation here or in the specification 

If you are curious what is the canonicalized form of your XMLs just give them a try following the example above.