XML for Secured Message Passing

Securely passing XML Documents between two parties requires that the receiver can find out:

  • who sent the message
  • whether the message has been tampered with

and the sender can guarantee that:

  • enough information is available so the receiver can verify who the sender is
  • the receiver can find out whether the message has been tampered with

These requirements can be satisfied if the document is digitally signed. However, using XML as a message format brings additional complications that must be handled. Consider the following two documents:

They represent the same information but in different way:

  • the attribute value is double quoted in the first document while it is single quoted in the second one
  • the empty element uses different syntax

This is a simple example. Think about all the other things that can differ. For example:

  • the order of the attributes
  • the document encoding
  • arbitrary whitespaces
  • new lines
  • how new lines are encoded

All these differences will cause documents having the same data result in different signatures.

 

XML Canonicalization to the Rescue

To avoid these situations, the XML document must be canonicalized so it is always represented in the same way no matter what the original document structure is. The canonicalization is done by following the rules defined in https://www.w3.org/TR/xml-c14n/. The following list is a sample of the rules just to show what they look like:

  • XML declaration is omitted
  • All characters are encoded to UTF-8
  • Empty element must be converted to start-end tag pair
  • Attribute values use double quotes
  • CDATA sections are replaced by its content only

Example

To demonstrate XML canonicalization the Apache Santuario library will be used.

The following code uses the Canonicalizer class to convert the XML documents to have the same structure.

As mentioned early this is a simple example. The XML document can be much richer containing comments, namespaces, entity references etc. The canonicalization of these cannot be handled by just one mapping. That’s why there are different algorithms that can be specified when a Canonicalizer is instantiated. You can see the full documentation here or in the specification 

If you are curious what is the canonicalized form of your XMLs just give them a try following the example above.

String to Numeric and Vice-Versa in Java

How often do you write code that converts String input to a number? How often do you need to convert it back?
For example consider the following code:

There are two conversions here. First, the literal “123123123” is converted to Integer and then that Integer is converted to String again (System.out.println calls the toString() method). So what do you expect the output to be? It should be the same as the original String argument. While this is true for the above-mentioned example this is not the case when operating with floating point numbers. What do you expect the output of this code to be:

Surprisingly or not (hopefully not) it is:

Yes, that’s right! After converting String to Float/Double and then back to String the value is different than the one we started with. This is not a big deal unless you develop an application that handels numbers without considering the context.

Let’s say there is a common method that converts Strings to numbers. Something like this:

At some point in time a defect comes up that the system cannot work with floating point numbers. So the easy fix would be:

This looks clever at first sight since the method is declared Number so its callers won’t break and the fix is in one place only. However, once this fix is deployed all the code that converts the number back to String will start to behave in an unexpected manner. There will be validation errors, wrong data display etc. For example, code that accepts String argument that is expected to be an integer representation will start getting values like “1.2312312E8”.
As an example see the program below

Run it with argument “123123123”. Now change lines 24 and 35 to

and run it again without changing the argument.

Yes I know that most of this is bad code and lack of good engineering. However, keep in mind that as a system grows it can get entangled and simple changes can lead to massive failures.

Marshall JAXB Elements Without Prior Declaration of Their Type

Java Architecture for XML Binding (JAXB) is an API for access of XML documents from applications written in the Java programming language. It’s convenient to use as it usually follows this pattern:

  1. Create JAXBContext
  2. Create Marshaller/Unmarshaller
  3. Marshal/Unmarshal the code

The following are examples that marshal:

and unmarshal XML document to object:

The marshalled XML looks like this:

The actual model is this:

As you can see the JAXBContext is initialized with the classes that take part in the marshall/unmarshall process (in this case the Project class has a reference to the Task class and that’s why there is no need to pass all classes to the JAXBContext). This means that those classes should be known beforehand. However often this is not the case. There are cases when the contained elements are known only at runtime. One such example is a Task Runner – an application that can run tasks and the tasks with their execution order and parameters are contained in single XML file (sounds familiar? Yep, Ant is what comes to mind). Consider the following example:

It contains several elements that have different names – tstamp, mkdir, delete – however they are all of the same type. They are all Tasks. And because such an application is extensible (users can add arbitrary Tasks) by plugins the Tasks are not known beforehand.

Let’s start with the model

Minor changes can be spotted. First of all Task is an abstract class so the actual implementors are not referenced. The other thing is the @XmlElementRef annotation. This annotation dynamically associates an XML element name with the JavaBean property. When used the XML element name is derived from the instance of the type of the JavaBean property at runtime.

This is the Java code defining the Tasks

There are two ways to marshal those objects. First add jaxb.index file to the package containing the JAXB annotated classes and list them. The second way is to provide ObjectFactory in the package containing the JAXB annotated classes.

 

Source: ObjectFactory.java
Here is an app that instantiates custom tasks and marshals them

Source: Application.java
As you can see in this case the JAXBContext is created by specifying the package names containing the JAXB annotated classes.

Full source code

 

Resources:
https://jaxb.java.net/nonav/2.2.4/docs/api/javax/xml/bind/annotation/XmlElementRef.html
https://jaxb.java.net/nonav/2.2.4/docs/api/javax/xml/bind/annotation/XmlElementDecl.html

Factory With Java 8

Consider an application that prints student names taken from a data access object.

The application can work with different data stores and the storage should not affect the operations performed. That’s why there is an interface DataAccess that provides the data and several implementations (in this case two) that know how to handle the reading from the actual data storage.

Another part of the application is the DataProcessor that prints students obtained from the DataAccess. The DataProcessor does not care what is the actual storage it needs just the data (the students’ names).

Here is a sample implementation:

Source: DataProcessor.java

Source: DataAccess.java

Source: MongoAccess.java

Source: RdbmsAccess.java

Here is how a main app would look like written in Java 7. First a Factory class that creates a DataAccess is required:

Source: Factory.java

and an Application that actually uses the DataProcessor

Source: Application.java

This code can consist of fewer lines if using Java 8 capabilities for Lambdas and the interfaces defined in java.util.function.Supplier

Source: ApplicationWithLambda.java

In this case the Factory class is replaced by java.util.function and the code is inlined using a lambda expression.
This code is more compact that the one using Java 7 but it can become event more compact using method references.

Source: ApplicationWithLambda.java

Here the Supplier is inlined using a method reference to the constructor of each data access class. A nice benefit of this approach is that in case of not supported type it is easy the show a readable error displaying all supported types.

You can find the full code here

Java 9 Process API Improvements

Introduction

Java 9 adds improvements to the Process API that deal with ongoing shortcommings like:

  • getting the PID of the running process
  • getting the children and/or the descendants of a process

Along with these a new class is added that helps:

  • listing all running processes
  • getting info for an arbitrary process
  • traversing process tree

Note: The information returned by these methods is just a snapshot of the processes running on the OS. Some of them may be missing, stopped etc. during the traversal so handle with care.

Samples

Showing Process Information

Sample showing obtaining information for running processes’ id and command

Getting information for a process returns Optional and based on that further filtering can be done. For example listing only processes that have available command property:

You can see there is no ‘N/A’ value for ‘Command’ in the output

Sample showing obtaining information for an arbitrary process

In this case the ‘pid’ is of the IDE used for writing the sample.

Getting the ID of the Running Process

Sample showing obtaining information for a process that was killed after taking a snapshot from the OS

The output of this snippet is:

In this case the PID is of a text editor. To repro start the sample code in debug and once the processHandler is obtained stop the text editor.

Traversing process tree

The output of this program is:

In this sample ‘pid’ is the running IDE. The output shows the difference between children() and descendants(). At the begging of the programm a new process is started however it is not listed when children() is called and it is shown when descendants() is called.

Sample Application

Sample application that shows:

  • all process when invoked with argument ‘ps’
  • process’ children when invoked with arguments ‘children <pid>’
  • process’ descendants when invoked with arguments ‘descendants <pid>’

Source:App.java

References

Arbitrary Java object serialization to YAML

There are several good options for java object serialization and deserialization. One of these is to use the YAML data serialization standard.

YAML (rhymes with “camel“) is a human-friendly, cross language, Unicode based data serialization language designed around the common native data types of agile programming languages. It is broadly useful for programming needs ranging from configuration files to Internet messaging to object persistence to data auditing.

There are also several Java libraries that implement the YAML standard (as listed on yaml.org).

  • JvYaml
  • SnakeYAML
  • YamlBeans
  • JYaml

For the purpose of this article we’ll be using SnakeYAML as a solid modern implementation of the YAML 1.1 standard.

Consider the following simple example:

Source: Weapon.java

This Java bean can be serialized and deserialized with just the following code:

Source: WeaponTest.java

Simple Java beans are automatically serialized and deserialized, the data looks like this:

Of course in reality things are not always that simple. Your domain model may contain types that are not Java beans like java.net.URL, java.util.UUID, java.net.InetAddress and others. In this case the serialization process will ignore the non-bean objects and deserialization will fail altogether.

Fortunately SnakeYAML’s represent and construct functionality can easily be extended. Consider the following more complex example:

Source: Cyborg.java

Now we have non-bean types and java.util.Collection. To handle serialization and deserialization properly we need to use custom representer and constructor.

The representer takes the non-bean objects and transforms them into custom-tagged scalars. Also RepresentInetAddress is registered for both Inet4Address and Inet6Address to handle both protocols.

Source: ApplicationRepresenter.java

The constructor does exactly the opposite of the representer does.

Source: ApplicationConstructor.java

Our custom representer and constructor are passed to the constructor of the YAML processor. Putting everything together looks like this:

Source: CyborgTest.java

Now the type hierarchy is represented in YAML like this:

As you can see the non-bean type values are tagged with custom tags: !inet and !uuid. That’s how the YAML document is later properly deserialized.