When you work with BizTalk for a while there is a big chance you come across a scenario where you need to process a XML document that contains a base64 encoded document like for example a .tiff or .PDF.
Below you find a very simple example of such a document:
The ‘Document’ node in the above example contains the base64 encoded contents. For readability I truncated the contents of this node. The inefficient nature of base64 encoding causes the size of the node contents to be enormous.
Most of the “base64 scenarios” I have seen have a common set of steps that need to be executed in order:
- Receive XML document with embedded base64 encoded document in BizTalk.
- Process the XML part of the document by an orchestration.
- Send the base64 encoded document to a back-end system.
Also in most cases there is no actual need for the base64 encoded document to travel along with its XML container message through BizTalk. Bringing such documents in BizTalk (especially in an orchestration) comes with a performance penalty and is resource intensive. Other XML nodes in the message are needed and processed by for instance an orchestration but the base64 encoded document is just sent to a back-end system. A typical example is a scenario where you need to feed a document management system using a web service.
Although I’ve seen companies use different solutions they all come down to the same basic pattern:
- Strip off, decode and store the base64 encoded document in a temporary store (file system) when received by BizTalk.
- Process the XML message (without bae64 encoded document) in an orchestration.
- Load, encode and add the document back in to the XML message just before sending to a back-end system.
Key things that should be taken into account when implementing this pattern:
- Strip off the base64 content as soon as it is received by BizTalk. In other words use a receive pipeline. This will keep the document out of the message box.
- Add the document back to the content at the last possible moment. In other words use send pipeline.
- Use a streaming approach in the pipelines to prevent from high memory footprints and optimize through put.
- Optionally decide not to decode and encode the base64 document but just store it encoded in the temporary file store to optimize performance.
The following figure illustrates the pattern:
I’m aware that there has been written a lot about BizTalk, streaming and pipelines. I couldn’t find anything on the above topic however. In particular there are no samples. Besides that I’ve seen people make nasty mistakes in their attempt to implement above pattern .
Because of the above mentioned reasons and because I recently had to implement this pattern myself for a customer I decided to provide a downloadable sample.
In order to to remove and store the content of the document we need to write a pipeline component. The component needs to redirect the destination stream to a file on the right moment and switch back to the original stream after that. The component needs some configuration information to determine where to store the temporary document and where it can find the document in the XML message. Per instance pipeline configuration is used to configure and store this information.
The pipeline component execute method is very simple:
The method does a couple of things. It first generates a filename. This means it replaces the %Guid% macro in the filename with a new guid. Next a custom XmlReader class is instantiated. This class does the actual work and is a wrapper around the XmlTextReader class (see below). I use the XmlTranslator class to ‘convert’ to reader back into a stream. Note that this implementation does not actually do anything with the message it just connects the streams and waits for the stream to be pulled. Finally this method writes the name of the generated document file on the context. It does this using the configured key and namespace.
The custom XmlReader is wrapper class around the XmlTextReader class with a specialized implementation of the ‘Read’ method:
The method checks to see whether or not the element configured in the pipeline configuration is found. I know it would be better when the Read method would not just check the local name but also checks if the namespace of the element
If the element is found the method ‘Base64DecodeDocument’ is called to write the contents (chunk by chunk) to a file stream using the BinaryReader class. In fact this method pulls the stream until it reaches the end of the base64 encoded content.
After the pipeline is executed the XML message will contain an empty node and a decoded file is stored in the temporary storage location.
The sending side needs to do the reverse of the receiving side. It has to set the source stream to a file instead of original stream and switch back on the right moment. This is also implemented as a pipeline component. In this case the execute looks like:
The pipeline component has the following configuration information:
The ‘ElementTo…’ properties tell the component which element must be created to put the base64 encoded contents in. The ‘FilePath..’ properties are used to tell the component what context property to use to get the filename of the stored document. The ‘PreceedingElementName’ is the name of the element after which the new element will be created. An alternative could be to have a map create a new and empty node. The pipeline component than only has to fill the contents of that node.
The execute method as indicated above first gets the filename of the document from the context. After that it instantiates a ‘RetrieveAndAddDocumentStream’ class and passes the configuration information.
The ‘RetrieveAndAddDocumentStream’ is a subclassed version of the ‘XmlTranslatorStream’ class. The ‘TranslateEndElement’ method of this class will detect the element configured in the variable ‘PreceedingElementName’ and put the class in the state ‘ElementFound’. After this the new element is created and the contents is not read from the XML stream but from the file stream. This happens until the complete file is read. Finally a closing tag is written.
I’ve put together a sample that receives a XML message with base64 encoded contents. It strips off the contents, decodes it and saves it to a file. After that the message without document is send to the message box. A send port picks up the message and puts the document back in. There is no orchestration that processes the XML message. I just want to show the working of the pipeline components redirecting the streams back and to the file.
The sample source code can be downloaded from here.
In order to run the sample you need to build both solutions, put the pipeline components in the GAC and deploy the pipeline project to BizTalk. A binding file is provided that will create the necessary ports. If you extract the .zip file to ‘C:\Sample.Pipelines\’ the binding file will have the correct paths.
- Make sure the temporary storage location is reliable. You can for example use a clustered or mirrored file system.
- Think about error handling and resubmit. If something goes wrong while processing the XML message be aware that the message is not the same as the message received by BizTalk. The base64 content is missing in the first message
- Also think about the right moment to remove the file from the temporary storage location. You probably want to do this when document has been put back into the XML message.