BizTalk Patterns: Database Assisted Aggregation

October 23, 2015, 3:15 pm

≫ Next: Using SQL Server Sequences in Integration

≪ Previous: Configuring the BizTalk Group, System.EnterpriseServices.TransactionProxyException, and MSDTC

Aggregation is a common pattern used in Enterprise Integration. System (or systems) A sends many messages that System B expects in a single message, or in several messages grouped on a particular attribute (or set of attributes).

The most common way to approach this in BizTalk is using a Sequential Convoy orchestration to aggregate the message – Microsoft provides a sample of how to do this in the SDK. This is a powerful pattern, but has a few downsides:

Sequential Convoys can become difficult to manage if they’re expected to run for a long time
Complex subgrouping can multiply the headaches here – for example, if you have to aggregate messages for hundreds of destinations concurrently
The destination message may become very large, to the point where BizTalk cannot optimally process it anymore – particularly if it is a large flat file message.
Modifying the aggregate message may be challenging in a standard BizTalk map, especially if one message might be expected to modify a previously aggregated message.

In this post, we’ll look at two approaches to overcoming these challenges through a database assisted aggregation pattern.

The advantages of this pattern are:

Avoid complexities of sequential convoys, especially sequential convoys that may have to run for long periods of time
Allow for complex subgrouping
More flexible data modification options throughout aggregation

There are two versions of this pattern that I’ve been using. One involves storing XML blobs in a document-database model, and one involves writing data down to a table throughout the day. The document model works well if there’s no need to modify data, data is not going to a flat file destination, and you need to capture only limited metadata about the document. The table model works well for flat file based output, especially for larger flat files that would be prohibitive to work with in BizTalk in a single shot.

Document Model

In this scenario, whenever a message comes in that needs to be aggregated, it’s mapped to a stored procedure that takes an XML parameter along with any other metadata fields about that message that will be captured on the row level. This might be as simple as a Transaction ID, OriginID, and a date that will be used for grouping/sorting. A stored procedure would look like this:

CREATE TABLE tAggregatorSample (TransactionID int,  OriginId, DT datetime, doc XML);
GO

CREATE PROCEDURE pInsertDoc
(
   @TransactionID int,
   @OriginID int,
   @dt DATETIME,
   @doc XML
)
AS
BEGIN

   INSERT tAggregatorSample (TransactionId, OriginId, DT, doc) VALUES (@TransactionID, @originId, @dt, @doc);

END

A map to this procedure would look fairly standard, but the “doc” node would take input from a scripting functoid to use an xsl:copy-of instruction to copy the desired portion of the XML document. This snippet would go in a scripting functoid using Inline XSLT, and a single output link to the doc node:

<doc xmlns='http://schemas.microsoft.com/Sql/2008/05/TypedProcedures/pInsertDoc'>
  <!-- put it in a CDATA node to prevent BizTalk from confusing content in the mapper -->
  <xsl:text disable-output-escaping="yes">&lt;![CDATA[</xsl:text>
    <xsl:copy-of select="/path/to/node/to/copy" />
  <xsl:text disable-output-escaping="yes">]]</xsl:text>
  <xsl:text disable-output-escaping="yes">&gt;</xsl:text>
</doc>

The select attribute on the copy-of could point to any node in the document; the root node works for the entire document, but since we’re aggregating we probably want a child node that will later be re-wrapped in the root node. So, using the following XML:

<ns0:Root xmlns:ns0="http://ns-for-schema">
   <ns0:TransactionId>1</ns0:TransactionId>
   <ns0:OriginId>123</ns0:OriginId>
   <ns0:TransactionDate>2015-10-10</ns0:TransactionDate>
   <ns0:Transaction><ns0:Child /><ns0:Child />...</ns0:Transaction>
<ns0:Root>

We’d probably want the path for the “Transaction” node.

On the outbound side of things, when ready (or a particular trigger message is received), a stored procedure like the following will extract the aggregated XML from the database:

CREATE PROCEDURE pExtractDoc
(
 @OriginId int,
 @dt DATETIME
)
AS
BEGIN

WITH XMLNAMESPACES('http://ns-for-schema' as ns1)
SELECT TOP 1
 TransactionId as 'ns0:TransactionId'
 ,OriginId as 'ns0:OriginId'
 ,dt as 'ns0:TransactionDate')
 ,(SELECT doc as '*' -- just get the raw XML as is
 FROM pAggregatorSample t2
 WHERE OriginId = @OriginId
 ORDER BY DT
 FOR XML PATH(''), TYPE) -- stuff the XML back into the parent
 FOR XML PATH('ns1:Root')
END

Invoking this as an XmlProcedure would give you back your aggregated document ready for consumption by BizTalk (for example, a send port filtering on such messages coming from this port).

Table Model

This model works well when large aggregation is done for a simple flat file. In this case, the scenario is a system that’s expecting a very large flat file, or perhaps needs previous records updated. Here, the inbound map and stored procedure is far more standard – map the relevant fields to the columns as desired, insert them, and you’re good. If the rows need to be modified, they can now be modified easily using standard SQL UPDATE statements.

On the outbound end, you have a couple options. If the file is going to a large CSV file, you can use the BCP utility to very quickly export it; I’ve tested this with up to 500MB worth of raw flat file data, and it finishes in about a minute (forget even trying to process a message like that in BizTalk – the XML version would be several gigabytes). On the other hand, if you’re dealing with smaller quantities of data, you can export the data back to BizTalk via a stored procedure (either StronglyTyped or XmlProcedure as desired), map to your destination schema, and be on your way. Even in this case, the outbound mapping becomes much simpler and avoids the need to reconstruct messages for every update (very expensive as the message grows in size).

Some data retention policy should be put into place here – issuing a TRUNCATE after the last aggregated message is sent, or deleting rows as they get polled for example. If there’s any desire to resubmit aggregated messages, the data could be stored for longer periods of time and resubmitted easily using the same process as the initial export.

↧

Using SQL Server Sequences in Integration

November 21, 2015, 3:30 am

≫ Next: ESB Exception Encoder: Value cannot be null

≪ Previous: BizTalk Patterns: Database Assisted Aggregation

The Challenge

An integration scenario requires a unique incrementing numeric identifier to be sent with each message (or each record in a message). These identifiers cannot be reused (or cannot be reused over certain ranges, or cannot be reused over certain periods of time). A GUID is not suitable because it will not be sequential (not to mention that many legacy systems and data formats may have trouble handling a 128 bit number!).

The Solution

Integration platforms will have a hard time meeting this on their own – GUIDs work well because they guarantee uniqueness on the fly without needing to worry about history. Messaging platforms typically deal in terms of short executions, and BizTalk is no exception. While persistence of a message might be handled (such as BizTalk does with the MessageBox), persistence of the entire execution process is usually not guaranteed. Deployments, updates, or even system resets may bring the integration platform down temporarily, and building a singleton instance that knows how to keep track of such things and compensate can become a major task.

However, if you’re using SQL Server 2012+, you have the option of creating a sequence and having the database guarantee the uniqueness and incremental nature for you. Sequences in SQL Server work somewhat like IDENTITY Columns, but are not tied to a particular column in the database and can be used for many other purposes. Creating one is fairly simple:

CREATE SEQUENCE dbo.CountBy1
    AS int
    START WITH 1
    INCREMENT BY 1 ;

There are several other options documented here, which include specifying the return type, whether to recycle when the sequence hits a certain number, etc. Sequences make it possible to either get the next available number

NEXT VALUE FOR Test.CountBy1

or a range of available values starting with the first one:

CREATE FUNCTION fReserve100()
RETURNS int
AS
BEGIN
  DECLARE @first int;
  -- reserve the next hundred numbers from the sequence, and tell us what one to start with
  sp_sequence_get_range @sequence_name = N'dbo.CountBy1', @range_size = 100, @range_first_value = @first OUTPUT;
  RETURN @first;
END

BizTalk applications can leverage this functionality by either directly calling one of these functions and using the return value, or by calling some other stored procedure that leverages a sequence and allows the integration to pick the message back up. If you’re simply using a sequence to seed a number in a map, an inline ADO.NET call in a pipeline component or orchestration is probably your best bet (it’s generally best to avoid this kind of logic in a script or external assembly call from the map itself though – exception handling in maps gets pretty hairy pretty quickly, and it’s very easy to lose track of the scope of how often such a function would get called in a map). For example:

int NextSeqNum()
{
  using (SqlConnection conn = new SqlConnection(Config.ConnectionString)
  {
    string sql = "SELECT NEXT VALUE FOR dbo.CountBy1";
    using (SqlCommand cmd = conn.CreateCommand())
    {
      cmd.CommandText = sql;
      int result = (int)cmd.ExecuteScalar();
      return result;
    }
  }
}

Obviously, this could be expanded to allow you to call the function we defined earlier, or a function that allows you to reserve a specific range from the sequence, etc.

Caveats

There are a few things to be aware of. Much like IDENTITY specs, SQL Server will not allow you to return used sequence numbers that you decide you don’t want to use (i.e. if your transaction fails and you’re trying to compensate down the line); if you’re using a sequence generated number for multiple destination systems and one fails in a way that the same number should not be used again for that system, you should rollback the transaction for all systems that have received the message. Destination systems that expect no gaps in sequencing will either have to be configured to allow reuse of failed numbers or tolerance for gaps that get created by failure.

Sequences can also be reset or recreated in the database independently of the integration use. However, they can also be manually set to start with a certain number if this happens.

If you use a sequence value in a column that doesn’t have a unique constraint in SQL Server, repetitions will be allowed (unlike an identity column). While any call

NEXT VALUE FOR

should give you a unique incrementing number (assuming the sequence hasn’t recycled or been manually reset), there’s some sacrifice here in terms of uniqueness. If the requirement really is for a unique identifier, you may have to sacrifice some of the ability to have it be sequential.

↧

ESB Exception Encoder: Value cannot be null

December 23, 2015, 10:26 am

≫ Next: WCF-SQL Polling and the ESB Portal

≪ Previous: Using SQL Server Sequences in Integration

I was working on a message flow that involved routing the response from a Request-Response send port directly to another send port. The ultimate send port had failed message routing turned on so that messages would get properly routed by the ESB ToolKit to the EsbExceptionDb for further analysis or processing. In this particular case, the send port was an SFTP port with a misconfigured username/password combination, which would result in failed messages. However, when the message failed I’d get the following exception in the EventLog:

Value cannot be null.
 Parameter name: guid
Source: Microsoft.Practices.ESB.ExceptionHandling.PipelineComponents.ProcessFault
Method: Microsoft.BizTalk.Message.Interop.IBaseMessage Execute(Microsoft.BizTalk.Component.Interop.IPipelineContext, Microsoft.BizTalk.Message.Interop.IBaseMessage)
Error Source: Microsoft.Practices.ESB.ExceptionHandling
Error TargetSite: System.String GetApplication(ServiceType, System.String, System.String, System.String, System.String)
Error StackTrace: at Microsoft.Practices.ESB.ExceptionHandling.Utility.GetApplication(ServiceType serviceType, String mgmtDBServer, String mgmtDBName, String guid, String serviceName)
 at Microsoft.Practices.ESB.ExceptionHandling.PipelineComponents.ProcessFault.WriteHeaderFailedMessageRouting(XmlTextWriter writer, IBaseMessage pInMsg, Object& portName)
 at Microsoft.Practices.ESB.ExceptionHandling.PipelineComponents.ProcessFault.WriteHeader(XmlTextWriter writer, XmlTextReader reader, FaultSource faultSource, IBaseMessage pInMsg, Object& portName)
 at Microsoft.Practices.ESB.ExceptionHandling.PipelineComponents.ProcessFault.Execute(IPipelineContext pContext, IBaseMessage pInMsg)

I found another MSDN user going through the same problem, and his findings suggested that the problem was somehow related to BTS.ReceivePortID not being in the message context in this messaging scenario (the request-response port put the SendPort name into the BTS.ReceivePortName property, did not write or promote the ReceivePortID). The ESB ToolKit was trying to run some logic that uses this property, and throwing an exception when trying to pass a null value to that function.

An initial attempt to promote ReceivePortID as an empty GUID got rid of the exception, but the ESB Portal had no information in the application field for that message (the application lookup logic failed). Further inspection of the code the ESB Encoder component uses indicated that it tried to use the SendPortName on the error report (it seems like this is a faulty piece of code in the toolkit), but then would also check the “WasSolicitResponse” property. By nulling out WasSolicitResponse from the context, I was able to get the message to process correctly.


const string _sysProps = "http://schemas.microsoft.com/BizTalk/2003/system-properties";

pInMsg.Context.Write("WasSolicitResponse", _sysProps, null);

This results in removing WasSolicitResponse from the message context, and allows the ESB ToolKit to properly process the message through to the EsbExceptionDb. Ideally, this should be done on the ultimate send port (in my case, the SFTP send port) to avoid interfering with any Orchestration processing that may have to be done on this message (an Orchestration might use this property to implement correlation). However, this avoids the need to create a copy of the message or add an orchestration in for special handling (but additional overhead!).

↧

WCF-SQL Polling and the ESB Portal

January 26, 2016, 3:22 pm

≫ Next: BizTalk, Clustered MSDTC and Clustered EntSSO installation error

≪ Previous: ESB Exception Encoder: Value cannot be null

With the ESB Toolkit, BizTalk provides an excellent framework for handling exceptions that occur throughout the ESB. There are many built in facilities that are as simple as checking off a box to route failed messages to the portal, and within orchestrations you can easily build ESB Exception messages in catch blocks and route them to the portal as well.

However, these only apply if a message actually makes it to a pipeline or orchestration. For WCF SQL Polling receive locations, it’s possible that no message will ever make it to the pipeline – for example, if the procedure causes an exception to occur (perhaps by a developer intentionally using THROW or RAISERROR), the adapter will write the exception to the event log without providing a message for any pipeline or orchestration processing. Checking “suspend message on failure” doesn’t offer any help, since there is no actual message to suspend. If you have effective monitoring software, such as AIMS, SCOM, or BizTalk 360, you can configure it to alert on such errors/warnings in the event log. If you own the procedure in question, it may be possible to refactor it so that it returns an error message rather than throwing an exception – but this isn’t always possible either (or may involve heavier refactoring of developed work!). However, it’s also possible to route such exceptions to the event log using a custom WCF Endpoint Behavior.

First, the implementation of IErrorHandler, where the meat of the work is done:

using System;
using System.Collections.Generic;
using System.Data;
using System.Data.SqlClient;
using System.Linq;
using System.ServiceModel.Channels;
using System.ServiceModel.Dispatcher;
using System.Text;
using System.Threading.Tasks;
using System.Transactions;
using System.Xml.Linq;

namespace BT.Common.WCFBehaviors
{
    public class SqlExceptionHandler : IErrorHandler
    {
        // we'll take these in as parameters since there's no port to look them up from
        public string ApplicationName;
        public string PortName;
        public bool IsReceive;

        public SqlExceptionHandler(string app, string port, bool isReceive)
        {
            ApplicationName = app;
            PortName = port;
            IsReceive = isReceive;
        }

        // this method gets called after the exception is handled for further processing
        // error is guaranteed to not be null
        public bool HandleError(Exception error)
        {
            try
            {
                // need to create a new transaction to be able to open a connection to the SQL Exceptions Database
                // this block of code could also be replaced with a call to the web service
                using (TransactionScope ts = new TransactionScope(TransactionScopeOption.RequiresNew))
                {
                    using (SqlConnection conn = new SqlConnection("Server=.; Database=EsbExceptionDb;Trusted_Connection=True"))
                    {
                        conn.Open();
                        using (SqlCommand cmd = conn.CreateCommand())
                        {
                            cmd.CommandType = System.Data.CommandType.StoredProcedure;
                            cmd.CommandText = "dbo.usp_insert_fault";

                            cmd.Parameters.Add("@FaultID", SqlDbType.UniqueIdentifier).Value = Guid.NewGuid();
                            cmd.Parameters.Add("@NativeMessageID", SqlDbType.VarChar).Value = Guid.NewGuid().ToString("d"); // this column expects unique values - we don't have a message to insert, just putting in a GUID
                            cmd.Parameters.Add("@ActivityID", SqlDbType.VarChar).Value = "";
                            cmd.Parameters.Add("@Application", SqlDbType.VarChar).Value = ApplicationName;
                            cmd.Parameters.Add("@Description", SqlDbType.VarChar).Value = "SqlException";
                            cmd.Parameters.Add("@ErrorType", SqlDbType.VarChar).Value = "SqlException";
                            cmd.Parameters.Add("@FailureCategory", SqlDbType.VarChar).Value = "Adapter";
                            cmd.Parameters.Add("@FaultCode", SqlDbType.Int).Value = -1;
                            cmd.Parameters.Add("@FaultDescription", SqlDbType.VarChar).Value = error.ToString().Left(4096); // extension method as in http://stackoverflow.com/questions/7574606/left-function-in-c-sharp
                            cmd.Parameters.Add("@Scope", SqlDbType.VarChar).Value = "Adapter";
                            cmd.Parameters.Add("@ServiceInstanceId", SqlDbType.VarChar).Value = "";
                            cmd.Parameters.Add("@ServiceName", SqlDbType.VarChar).Value = PortName;
                            cmd.Parameters.Add("@MachineName", SqlDbType.VarChar).Value = Environment.MachineName;
                            cmd.Parameters.Add("@ExceptionMessage", SqlDbType.VarChar).Value = error.Message;
                            cmd.Parameters.Add("@ExceptionType", SqlDbType.VarChar).Value = error.GetType().Name;
                            cmd.Parameters.Add("@ExceptionSource", SqlDbType.VarChar).Value = "Adapter";
                            cmd.Parameters.Add("@ExceptionTargetSite", SqlDbType.VarChar).Value = error.TargetSite.ToString();
                            cmd.Parameters.Add("@ExceptionStackTrace", SqlDbType.VarChar).Value = error.StackTrace.Left(4096); // same extension method
                            if (error.InnerException != null)
                            {
                                cmd.Parameters.Add("@InnerExceptionMessage", SqlDbType.VarChar).Value = error.InnerException.ToString().Left(4096); // same extension method
                            }
                            else
                            {
                                cmd.Parameters.Add("@InnerExceptionMessage", SqlDbType.VarChar).Value = "";
                            }

                            cmd.Parameters.Add("@DateTime", SqlDbType.DateTime).Value = DateTime.UtcNow;
                            cmd.Parameters.Add("@FaultSeverity", SqlDbType.Int).Value = 2; // "Error"
                            cmd.Parameters.Add("@FaultGenerator", SqlDbType.VarChar).Value = IsReceive ? "Messaging.ReceiveLocation" : "Messaging.SendPort";

                            cmd.ExecuteNonQuery();
                        }
                    }
                    ts.Complete();
                }
            }
            catch (Exception ex)
            {
                // Log an error about the exception
            }

            return true;
        }

        // this method will not get called here because there adapter doesn't have a message to send to BizTalk.  We'll just ignore it.
        public void ProvideFault(Exception error, System.ServiceModel.Channels.MessageVersion version, ref System.ServiceModel.Channels.Message fault)
        {
            return;
        }
    }
}

The other two classes to implement the behavior are basically boilerplate, with the addition of the parameters to determine whether this is a Receive Location or Send Port, the application name to use, and the Port name to use. Normally, the ESB Toolkit components look these up based on the message context – but here we have no message context to use! Here’s the implementation of IEndpointBehavior:

using System;
using System.ServiceModel.Channels;
using System.ServiceModel.Dispatcher;
using System.ServiceModel.Description;

namespace BT.Common.WCFBehaviors
{
    public class SqlExceptionBehavior : IEndpointBehavior
    {
        string _app;
        string _port;
        bool _isrecv;

        public SqlExceptionBehavior(string app, string port, bool isrecv)
        {
            _app = app;
            _port = port;
            _isrecv = isrecv;
        }
        public void AddBindingParameters(ServiceEndpoint endpoint, BindingParameterCollection bindingParameters)
        {
            return;
        }

        public void ApplyClientBehavior(ServiceEndpoint endpoint, ClientRuntime clientRuntime)
        {
            return;
        }

        public void ApplyDispatchBehavior(ServiceEndpoint endpoint, EndpointDispatcher endpointDispatcher)
        {
            SqlExceptionHandler handler = new SqlExceptionHandler(_app, _port, _isrecv);
            endpointDispatcher.ChannelDispatcher.ErrorHandlers.Add(handler);
        }

        public void Validate(ServiceEndpoint endpoint)
        {
            return;
        }
    }
}

And lastly, the implementation of BehaviorExtensionElement:

using System;
using System.Collections.Generic;
using System.Configuration;
using System.Linq;
using System.ServiceModel.Configuration;
using System.Text;
using System.Threading.Tasks;

namespace BT.Common.WCFBehaviors
{
    class SqlExceptionBehaviorElement : BehaviorExtensionElement
    {
        public override Type BehaviorType
        {
            get { return typeof(SqlExceptionBehavior); }
        }

        protected override object CreateBehavior()
        {
            // pass the properties to SqlExceptionBehavior
            return new SqlExceptionBehavior(ApplicationName, PortName, IsReceive);
        }

        ConfigurationPropertyCollection _properties;
        [ConfigurationProperty("ApplicationName")]
        public string ApplicationName
        {
            get { return (string)base["ApplicationName"]; }
            set { base["ApplicationName"] = value; }
        }

        [ConfigurationProperty("PortName")]
        public string PortName
        {
            get { return (string)base["PortName"]; }
            set { base["PortName"] = value; }
        }

        [ConfigurationProperty("IsReceive")]
        public bool IsReceive
        {
            get { return (bool)base["IsReceive"]; }
            set { base["IsReceive"] = value; }
        }

        // set up properties and defaults
        protected override ConfigurationPropertyCollection Properties
        {
            get
            {
                if (this._properties == null)
                {
                    _properties = new ConfigurationPropertyCollection();
                    _properties.Add(new ConfigurationProperty("ApplicationName", typeof(string), "JJill.BT.Common"));
                    _properties.Add(new ConfigurationProperty("PortName", typeof(string), "Unknown"));
                    _properties.Add(new ConfigurationProperty("IsReceive", typeof(bool), true));
                }
                return _properties;
            }
        }

    }
}

Sign the assembly, build it, GAC it. Then, to make the behavior available in BizTalk, you can import the following configuration file (no need to edit machine.config, since this is for in process WCF adapters, not IIS!):

<?xml version="1.0" encoding="utf-8" ?>
<configuration>
  <system.serviceModel>
    <extensions>
      <behaviorExtensions>
        <add name="sqlExceptionHandler" type="BT.Common.WCFBehaviors.SqlExceptionBehaviorElement, BT.Common.WCFBehaviors, Version=1.0.0.0, Culture=neutral, PublicKeyToken=1f2c9024358fb718"/> <!-- replace with the strong name of your assembly - public key token will vary -->
      </behaviorExtensions>
    </extensions>
  </system.serviceModel>
</configuration>

Save this file and import it using the Admin Console under the correct receive/send host for the adapter (for example, the receive host for WCF-Custom; screen cap taken after importing, so now it shows the behaviorExtensions elements):

Once you’ve restarted the host instances (and the admin console after GACing the assembly), you can add the behavior to a receive location or send port:

And it can be configured like so:

And now these exceptions will be routed to the ESB Exception Database (and through that to the ESB Portal). Other logging or alerting could be done as well, either from the behavior or from the portal alerts. Enjoy!

↧

BizTalk, Clustered MSDTC and Clustered EntSSO installation error

February 24, 2016, 3:32 pm

≫ Next: Working with FILESTREAM BLOBs in BizTalk

≪ Previous: WCF-SQL Polling and the ESB Portal

There are a few good resources out there for setting up a clustered Master Secret Server out there:

However, I faced some issues recently setting all of this up, getting the following errors (in the event log and the configuration log):

Creation of Adapter FILE Configuration Store entries failed. (BizTalk config log)
Could not import a DTC transaction. Please check that MSDTC is configured correctly for remote operation. See the event log (on computer EntSSOClusterResource) (BizTalk config log)
d:\bt\127854\private\source\setup\btscfg\btscfg.cpp(2213): FAILED hr = c0002a25 (BizTalk Config log)
Failed to initialize the needed name objects. Error Specifics: hr = 0x80004005, com\complus\dtc\dtc\msdtcprx\src\dtcinit.cpp:575, CmdLine: “C:\Program Files\Common Files\Enterprise Single Sign-On\ENTSSO.exe”, Pid: 172 (Event log)
Could not import a DTC transaction. Please check that MSDTC is configured correctly for remote operation. See documentation for details. Error Code: 0x80070057, The parameter is incorrect. (Event log)

DTC seemed to be culprit here (or possible EntSSO), but DTC Ping/Tester worked fine from the app server to the clustered resource (in fact, the installer had no problem configuring the Group settings with this issue – it choked on the Runtime configuration). Despite that, it still seemed like it was ultimately a DTC issue, so I started working through many of the normal DTC issues that come up. We uninstalled and reinstalled MSDTC on all involved machines (some had been imaged from a common source using the same GUIDs in the CID registry key under HKCR), and imported the following registry key to ensure that RPC wasn’t causing an issue.


Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\SOFTWARE\Policies\Microsoft\Windows NT\RPC]
"EnableAuthEpResolution"=dword:00000001
"RestrictRemoteClients"=dword:00000000

In the end it came down to a single setting that the ever-helpful DTC troubleshooting wiki mentions:

We had configured everything to use “No Authentication Required” to get the broadest support (for some older servers on the network if needed). This does mean that servers which don’t support this authentication will not be able to participate in DTC, but it did resolve the issue on the cluster and allow us to properly configure the BizTalk Runtime.

↧

Working with FILESTREAM BLOBs in BizTalk

March 1, 2016, 6:06 am

≫ Next: Xsd.exe, Arrays, and “Specified”

≪ Previous: BizTalk, Clustered MSDTC and Clustered EntSSO installation error

MSDN provides an example of INSERTing large data into SQL Server, leveraging the WCF-SQL adapter’s built in FILESTREAM capabilities. However, it’s also possible to leverage the transaction enlisted by the WCF adapter in a custom pipeline to pull FILESTREAM data out of SQL Server more efficiently than the more common SELECT … FOR XML query which simply grabs the FILESTREAM content and stuffs it into an XML node in the resulting document.

Imagine, for example, you had a large document of some sort (XML, Flat File, etc.) to store in SQL Server that BizTalk would need to process from a FILESTREAM table defined like so:

CREATE TABLE [tFilestreamDemo] (
    [CreateDate] DATETIME NOT NULL,
    [CreateUser] VARCHAR (50) NOT NULL,
    [ID] UNIQUEIDENTIFIER DEFAULT (NEWSEQUENTIALID()) ROWGUIDCOL NOT NULL,
    [Metadata1] INT NOT NULL,
    [MetadataDate] DATE NOT NULL,
    [Metadata3] VARCHAR (50) NULL,
    [XmlBlob] VARBINARY (MAX) FILESTREAM NOT NULL,
    CONSTRAINT [PK_tFilestreamDemo] PRIMARY KEY CLUSTERED ([ID] ASC)
) FILESTREAM;

The tradeoff here is losing the typed XML data in favor of more efficient storage and access to larger file objects (especially when the data will, on average, be large). This can make a vast difference if you have to store a large (>100MB) XML file in the database for some reason.

It would be possible to extract data from this table writing a procedure as follows:

CREATE PROCEDURE usp_Sel_FilestreamDemo
(
    @MetadataDate DATE
)
AS
SET NOCOUNT ON
SET XACT_ABORT ON
BEGIN
    ;WITH XMLNAMESPACES('http://sql_message_target_namespace.com' as ns0)
    SELECT
        Metadata1 as 'ns0:Metadata1'
       ,MetadataDate as 'ns0:MetadataDate'
       ,Metadata3 as 'ns0:Metadata3'
       ,CAST(XmlBlob as XML) as 'ns0:XmlBlob' -- note -if we don't cast it, we'll get it as Base64
    FROM tFileStreamDemo
    FOR XML PATH('ns0:Root')
END

This will result in a very large message, like so (assuming XmlPolling is used – the same principle would apply to TypedPolling):

<SqlAdapterWrapperNode xmlns="http://namespace_adapter_is_Configured_to_use.com">
  <ns0:Root xmlns:ns0="http://sql_message_target_namespace.com">
    <ns0:Metadata1>1</ns0:Metadata1>
    <ns0:Metadata2>2016-02-29</ns0:Metadata2>
    <ns0:Metadata3>Metadata3</ns0:Metadata3>
    <ns0:XmlBlob><![CDATA[<100 MB of XML text here.../>]]></ns0:XmlBlob>
  </ns0:Root>
  <ns0:Root>
    ...
  </ns0:Root>
</SqlAdapterWrapperNode>

Which will be problematic for a few reasons:

It will cause additional load on SQL Server when it tries to retrieve a lot of large data to send over the pipe
It will greatly increase the likelyhood of connection timeouts during this period
It will be one more large XML message BizTalk will have to track.

To avoid this, we could instead utilize a the SqlFileStream class with the following procedure

CREATE PROCEDURE usp_Sel_FilestreamDemo
(
    @MetadataDate DATE
)
AS
SET NOCOUNT ON
SET XACT_ABORT ON
BEGIN
    ;WITH XMLNAMESPACES('http://sql_message_target_namespace.com' as ns0)
    SELECT
        Metadata1 as 'ns0:Metadata1'
       ,MetadataDate as 'ns0:MetadataDate'
       ,Metadata3 as 'ns0:Metadata3'
       ,XmlBlob.PathName() as 'ns0:FilePath' -- this will give us a path to pass in the disassembler to the actual FILESTREAM file
       ,GET_FILESTREAM_TRANSACTION_CONTEXT() as 'ns0:TxContext' -- this will give us the transaction context to use in that call
    FROM tFileStreamDemo
    FOR XML PATH('ns0:Root')
END

This procedure will be much more efficient – instead of trying to make the SQL Server process retrieve the entire blob and write it to the XML file (and do some data conversion on it), we’re simply returning the handle and transaction context for that file. The resultant XML will look more like this:

<SqlAdapterWraperNode xmlns="http://ns_adapter_is_configured_to_use.com">
  <ns0:Root xmlns:ns0="http://sql_message_target_namespace.com">
    <ns0:Metadata1>1</ns0:Metadata1>
    <ns0:Metadata2>2016-02-29</ns0:Metadata2>
    <ns0:Metadata3>Metadata3</ns0:Metadata3>
    <ns0:FilePath>\\server_name\path\to\mssql\filestream\volumeinfo</ns0:FilePath>
    <ns0:TxContext>0x1C2D7FD5DC09164EA21D3AFD27611A6D</ns0:TxContext>
  </ns0:Root>
  <ns0:Root>
    ...
  </ns0:Root>
</SqlAdapterWrapperNode>

And this will be much easier for BizTalk to process, even if multiple records are returned in the resultset. Now, loading that file can be deferred. If you want to load that file in the receive location, you can use the TxContext value that has been passed – the location will be operating under the same transaction as the adapter (assuming UseAmbientTransaction is set to true).

Following that path, here’s what the code for the Disassembler methods would look like (a Disassembler is required here because we’re assuming we might want to handle multiple files per message – if not, then it’d be fine to write a somewhat simpler Decoder component instead, but the rough idea would be the same):

The Disassemble method:

private Queue<IBaseMessage> qOutputMsgs = new Queue<IBaseMessage>();
private const string _sysProps = "http://schemas.microsoft.com/BizTalk/2003/system-properties";

public void Disassemble(IPipelineContext pContext, IBaseMessage pInMsg)
{
  Stream originalStream = pInMsg.BodyPart.GetOriginalDataStream();
  try
  {
    HashSet<string> propNamesToIgnore = new HashSet<string>()
    {
      "MessageType", "MessageID", "SchemaStrongName"
    };
    XmlReader reader = XmlReader.Create(originalStream);
    pContext.ResourceTracker.AddResource(reader);

    reader.MoveToContent();
    // we're effectively transforming the message
    // replace with your actual MessageType
    // get the IDocumentSpec so that we can properly promote SchemaStrongName for later mapping activities
    string msgType = "http://destination_message_target_namespace.com#Root";
    IDocumentSpec docSpecName = pContext.GetDocumentSpecByType(msgType);

                string connectionString = SSOHelper.GetSetting("ConnectionString"); // this could be done using BTDF or the Microsoft SSO Client sample
    IBaseMessageFactory msgFactory = pContext.GetMessageFactory();
    // open a new connection, which will implicitly use the MSDTC transaction from the adapter
    // we'll be able to use the FILESTREAM Transaction Context we recieved from the procedure since we're in the same transaction
    // only need one connection to get multiple files out
    using (SqlConnection conn = new SqlConnection(connectionString))
    {
      conn.Open();
      while (reader.ReadToFollowing("Metadata1"))
      {
        IBaseMessage outMsg = msgFactory.CreateMessage();
        outMsg.AddPart("Body", msgFactory.CreateMessagePart(), true);
        reader.Read();
        string metadata1 = reader.Value; // do whatever you want with this

        reader.ReadToFollowing("FileName");
        reader.Read();
        string fsName = reader.Value;

        reader.ReadToFollowing("TxContext");
        reader.Read();
        byte[] txContext = Convert.FromBase64String(reader.Value);
        VirtualStream vts = new VirtualStream();

        // this does the heavy lifting - load the file into a virtual stream
        // FileOptions.SequentialScan works well here because we're just sequentially copying the file to a new stream
        using (SqlFileStream sqlFileStream = new SqlFileStream(fsName, txContext, FileAccess.Read, FileOptions.SequentialScan, 0))
          sqlFileStream.CopyTo(vts); // this is fast.

        vts.Position = 0;
        pContext.ResourceTracker.AddResource(vts);

        outMsg.BodyPart.Data = vts;

        CopyProperties(pInMsg, outMsg, propNamesToIgnore);

        outMsg.Context.Promote("MessageType", _sysProps, msgType);
        outMsg.Context.Write("SchemaStrongName", _sysProps, docSpecName.DocSpecStrongName);

        qOutputMsgs.Enqueue(outMsg);
      }
    }
  }
  catch (Exception e)
  {
    // do whatever appropriate logging
    Logger.LogError("FILESTREAM Disassembler encountered exception:\r\n\r\n" + e.ToString());
    throw;
  }
}

A helper method to copy properties:

/// <summary>
/// Copies properties from one IBaseMessage to another
/// </summary>
/// <param name="pInMsg">Source message</param>
/// <param name="outMsg">Destination Message</param>
/// <param name="propsToIgnore">Optional - hashset of property names to ignore</param>
/// <returns>Number of properties copied</returns>
private static int CopyProperties(IBaseMessage pInMsg, IBaseMessage outMsg, HashSet<string> propsToIgnore = null)
{
  uint propCount = pInMsg.Context.CountProperties;

  int copiedProperties = 0;
  for (int i = 0; i < propCount; i++)
  {
    string propName;
    string propNS;
    object propVal = pInMsg.Context.ReadAt(i, out propName, out propNS);
    if (propsToIgnore == null || !propsToIgnore.Contains(propName))
    {
      if (pInMsg.Context.IsPromoted(propName, propNS))
      {
        outMsg.Context.Promote(propName, propNS, propVal);
      }
      else
      {
        outMsg.Context.Write(propName, propNS, propVal);
      }
      copiedProperties++;
    }
  }

  return copiedProperties;
}

And the GetNext method:

public IBaseMessage GetNext(IPipelineContext pContext)
{
  if (qOutputMsgs.Count > 0)
  {
    IBaseMessage outMsg = qOutputMsgs.Dequeue();
    return outMsg;
  }
  else
    return null;
}

While this will still load the large message into BizTalk, it will do so much more quickly (testing on this took processing down from ~5000ms per file per message to ~2ms per message (!) in my dev environment). It will also save significant load on the SQL process (and, some load on the BizTalk process) – freeing up valuable resources for other applications to use.

The same principles could be used in an orchestration or later component, but would require actually creating a transaction (or piggybacking off the MSDTC transaction in a send pipeline) and calling GET_FILESTREAM_TRANSACTION_CONTEXT() to get the transaction context token. Lenni Lobel has an excellent blog on how to do that here – the same .NET calls would apply in some custom orchestration or pipeline code, with the possible exception of using System.Transactions (to piggyback off MSDTC) instead of a regular SqlTransaction.

↧

Xsd.exe, Arrays, and “Specified”

March 10, 2016, 2:07 pm

≫ Next: SqlCommand oddity raises NullReferenceException on an otherwise valid query

≪ Previous: Working with FILESTREAM BLOBs in BizTalk

Microsoft’s XSD utility provides an excellent way to generate classes from schema definition files, but has a few quirks that can make using the generated classes a bit tougher. In particular, it serializes repeating structures as arrays rather than using the generic List class, and it serializes any element with a “minOccurs = ‘0’” with a separate “Specified” property (which must be set to true if the member is to be serialzied back to XML).

We frequently use serialization techniques here, and while there are some utilities out there that offer similar post processing, many of them are not free and/or difficult to package with a build (include an executable in the project? not ideal). In light of that, I wrote a PowerShell script (below) that can be included in source control and utilized in a post-build event. For example,

"$(TargetFrameworkSDKToolsDirectory)xsd.exe" /c "$(ProjectDir)ImportedPartCanonical.xsd" "$(ProjectDir)ProjectCanonical.xsd" /n:Tallan.BT.PipelineComponents

powershell.exe -ExecutionPolicy Unrestricted -file "$(solutiondir)\PowerShellScripts\PostProcessXsdExe.ps1" ProjectCanonical.cs "$(SolutionDir)Tallan.BT.PipelineComponents\SerializedClasses\ProjectCanonical.cs"

The /c flag on xsd.exe specifies you want to generate classes from the XSD files (multiple files if there are imports involved). The /n flag is in lieu of putting the classes in the global namespace. The -ExecutionPolicy flag will allow this execution of powershell to run an external script (so if another dev doesn’t have their execution policy set machine wide it will still work). And the script itself does the following:

If the property uses a “Specified” property, it will now automatically set “Specified” to true anytime you set that property.
If the property is an array, it will be changed to a generic List

For example:

public decimal NumericElement
{
  get
  {
    return this.numericElementField;
  }
  set
  {
    this.numericElementFieldSpecified = true; // script adds this line, so dev won't forget to set every time the property is set.
    this.numericElementField = value;
  }
}

...
private System.Collections.Generic.List<RepeatingElement> repeatingElementField; // had been a private RepeatingElement[] repeatingElementField

/// <remarks/>
[System.Xml.Serialization.XmlElementAttribute("Repeatingelement")]
public System.Collections.Generic.List<RepeatingElement> RepeatingElement
...

Here’s the script, which can be modified as you see fit:

# Author: Dan Field (dan.field@tallan.com)
# posted on blog.tallan.com/2016/03/10/xsd-exe-arrays-and-specified
# Purpose: fix the 'specified' attribute and convert arrays to list from XSD.exe generated classes

[CmdletBinding()]
Param(
    [Parameter(Mandatory=$true,Position=1)]
    [string]$inputFile,
    [Parameter(Mandatory=$true,Position=2)]
    [string]$outputFile,
    [switch]$DeleteInputFile
)

# much faster than using Get-Content and/or Out-File/Set-Content
$writer = [System.IO.StreamWriter] $outputFile
$reader = [System.IO.StreamReader] $inputFile

# used to track Specified properties
$setterDict = @{}

while (($line = $reader.ReadLine()) -ne $null)
{
    $thisStart = $line.IndexOf("this.") # will be used for
    $brackets = $line.IndexOf("[]") # indicates an array that will be converted to a Generic List

    # assume that any private field that contains "Specified" needs to be grabbed
    if (($line.IndexOf("private") -gt -1) -and ($line.IndexOf("Specified") -gt -1))
    {
        # get the field name
        $varName = $line.Split("{' ',';'}", [System.StringSplitOptions]::RemoveEmptyEntries)[-1]
        # use field name as a key, minus the ending "Specified" portion, e.g. fieldNameSpecified -> fieldName
        # the value in the dictionary will be added to setters on the main property, e.g. "this.fieldNameSpecified = true;"
        $setterDict.Add($varName.Substring(0, $varName.IndexOf("Specified")), "this." + $varName + " = true;")
        # output the line as is
        $writer.WriteLine($line)
    }
    # find property setters that aren't for the *Specified properties
    elseif (($thisStart -gt -1) -and ($line.IndexOf(" = value") -gt -1) -and ($line.IndexOf("Specified") -lt 0))
    {
        # get the field name
        $thisStart += 5
        $varName = $line.Substring($thisStart, $line.IndexOf(' ', $thisStart) - $thisStart)
        # see if there's a "Specified" property for this one
        if ($setterDict.ContainsKey($varName) -eq $true)
        {
            # set the Specified property whenever this property is set
            $writer.WriteLine((' ' * ($thisStart - 5)) + $setterDict[$varName])
        }
        # output the line itself
        $writer.WriteLine($line)
    }
    elseif ($brackets -gt 0) # change to List<T>
    {
        $lineParts = $line.Split(' ')
        foreach ($linePart in $lineParts)
        {
            if ($linePart.Contains("[]") -eq $true)
            {
                $writer.Write("System.Collections.Generic.List<" + $linePart.Replace("[]", "> "))
            }
            else
            {
                $writer.Write($linePart + " ")
            }
        }
        $writer.WriteLine();
    }
    else # just output the original line
    {
        $writer.WriteLine($line)
    }
}

if ($DeleteInputFile -eq $true)
{
    Remove-Item $inputFile
}

# Make sure the file gets fully written and clean up handles
$writer.Flush();
$writer.Dispose();
$reader.Dispose();

↧

SqlCommand oddity raises NullReferenceException on an otherwise valid query

April 26, 2016, 5:19 am

≫ Next: How to resolve the “Reason: Unexpected event (“eos”) in state “processing_header”.” error from the XML Disassembler

≪ Previous: Xsd.exe, Arrays, and “Specified”

Bit of a head scratcher for this one. I was working on some ADO.NET code that involved calling a stored procedure with many (10k+) table valued parameter rows being passed in. Occasionally, I’d see a bug where ExecuteNonQuery would result in an exception with the following stack trace (I tried it with ExecuteReader and ExecuteScalar just to be sure as well):

System.NullReferenceException was unhandled by user code
HResult=-2147467261
Message=Object reference not set to an instance of an object.
Source=System.Data
StackTrace:
at System.Data.SqlClient.SqlCommand.OnReturnStatus(Int32 status)
at System.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)
...

I knew for sure the command object was not null, and so I started looking at the Reference Source. It seemed the parameter collection was the cause of the issue.

I enabled CLR debugging in Visual Studio and dove in. The most relevant block of that function is here:

 // see if a return value is bound
int count = GetParameterCount(parameters);
for (int i = 0; i < count; i++) {
	SqlParameter parameter = parameters[i];
	if (parameter.Direction == ParameterDirection.ReturnValue) {
		object v = parameter.Value;

In my case, count was over 65,000, and some of the later members of the list weren’t fully initialized yet (due to some multi-threading issues, and creating many many parameters) – and I was not paying attention to the return value, so I had never added it. However, the exception completely went away by adding the following before any other parameters:

SqlCommand cmd = ...; // initialization code for SqlCommand here
SqlParameter retParam = new SqlParameter("@result", SqlDbType.Int);
retParam.Direction = ParameterDirection.ReturnValue;
cmd.Parameters.Add(retParam);

This ensures that the ReturnValue parameter is the first in the collection, and when that iteration occurs in OnReturnStatus, it will find the parameter immediately (instead of potentially traversing 65k parameters); it also got rid of the non-sensical exception in ADO.NET so I could bear down on the other issues related to this particular piece of code. The big take away here is that it seems like it would be best to always add the ReturnValue parameter immediately to a SqlParameterCollection, even if you don’t intend to use it. This would likely apply to EntityFramework EntityCommands (which are based on SqlCommand) as well.

↧

How to resolve the “Reason: Unexpected event (“eos”) in state “processing_header”.” error from the XML Disassembler

May 10, 2016, 4:07 am

≫ Next: Improving performance of inserting multiple parent/child tables in a single SQL Procedure

≪ Previous: SqlCommand oddity raises NullReferenceException on an otherwise valid query

This error is typically harmless, but can result when the XML Disassembler encounters an empty Envelope message instance that’s formatted like this:

<ns0:EnvelopeRoot xmlns:ns0="http://Tallan.BizTalk.Schemas.CommonEnvelope"/>

instead of this:

<ns0:EnvelopeRoot xmlns:ns0="http://Tallan.BizTalk.Schemas.CommonEnvelope"></ns0:EnvelopeRoot>

BizTalk chooses to make a semantic difference between these two instances, and process the second one fine (publishing no messages), but raising an exception like this for the first:

There was a failure executing the response(receive) pipeline: "..." Source: "XML disassembler" Send Port: "..." URI: "..." Reason: Unexpected event ("eos") in state "processing_header".

This can happen particularly when using an XmlProcedure or XmlPolling from SQL – if the resultset is empty, the adapter will publish this message. While this behavior may be desirable (and can frequently be avoided by ensuring you have good Data Available statements on your polling ports, and only call XmlProcedures with valid parameters/at valid times), it can also generate a lot of alarming errors. If you can tolerate empty envelopes and don’t want these errors (or would just like a better error), a custom Decode pipeline component can be used. The following code aims to be as non-obtrusive as possible (it will swallow exceptions and pass the message on if it can’t fix it, it will tolerate invalid XML characters like the BizTalk runtime, it will reset the body part’s position if it fails at some point but has read the position, and it will only attempt this correction if the document corresponds to an envelope schema), but corrects the message if it’s bad. If you wanted to raise a more meaningful exception, you could do that instead of rewriting the message. This code would go in the Execute method of a Decode component.

Stream origStream = pInMsg.BodyPart.GetOriginalDataStream();
try
{
    XmlReaderSettings readerSettings = new XmlReaderSettings();
    readerSettings.CheckCharacters = false;
    readerSettings.CloseInput = false;

    XmlReader reader = XmlReader.Create(origStream, readerSettings);
    pContext.ResourceTracker.AddResource(reader);

    reader.MoveToContent();

    IDocumentSpec docSpec = pContext.GetDocumentSpecByType(reader.NamespaceURI + "#" + reader.LocalName);
    if (!string.IsNullOrWhiteSpace(docSpec.GetBodyPath()) && reader.IsEmptyElement) // this is an envelope schema with an empty root node
    {
        // ALTERNATIVELY: throw new Exception("Empty Envelope message received from " ... etc.
        XmlWriterSettings writerSettings = new XmlWriterSettings();
        writerSettings.CheckCharacters = false;
        writerSettings.OmitXmlDeclaration = true;

        MemoryStream ms = new MemoryStream(); // for such a small stream, MemoryStream is perfectly fine - normally use VirtualStream.
        pContext.ResourceTracker.AddResource(ms);

        XmlWriter writer = XmlWriter.Create(ms, writerSettings);
        pContext.ResourceTracker.AddResource(writer);

        writer.WriteStartElement(reader.Prefix, reader.LocalName, reader.NamespaceURI);
        writer.WriteFullEndElement();
        writer.Flush();

        ms.Position = 0;
        pInMsg.BodyPart.Data = ms;
    }
}
catch (Exception e)
{
    // swallow exception
    System.Diagnostics.Debug.WriteLine(e.ToString());
}
finally // make sure we're somewhat well behaved
{
    if (pInMsg.BodyPart.Data.CanSeek == true)
        pInMsg.BodyPart.Data.Position = 0;
}

Enjoy!

↧

Improving performance of inserting multiple parent/child tables in a single SQL Procedure

May 24, 2016, 9:15 am

≫ Next: Dynamically set InstanceContextMode for a WCF Service from app.config or web.config

≪ Previous: How to resolve the “Reason: Unexpected event (“eos”) in state “processing_header”.” error from the XML Disassembler

Tom Babiec wrote a great blog a few months back on inserting multiple parent child tables in a single stored procedure. We use this technique a lot in our data integration work, and it’s proven to be very robust in many contexts. The SQL procedure outlined in that blog is useful not just for BizTalk, but generally speaking for ADO.NET and other applications trying to load multiple rows into SQL Server for multiple parent child tables. However, when dealing with larger datasets (and as the table grows), we’ve noticed some degradation in performance. In some cases, we were seeing variances of 30 seconds to 10+ minutes for loading the same exact data set on the same database. We tried a few different options, including forcing a recompile of the stored procedure between each load

WITH RECOMPILE

, but this did not solve the problem. Finally, taking a look at the query plan gave some additional insight for a fairly simple fix.

By default, the query optimizer is using the Nested Loops operator to do the final MERGE between the input table and the tracker table:

Note that the table scan can be avoided by ensuring there’s a proper primary key on the source table – but this table scan was of little consequence in our testing, as it’s scanning an input table variable either way.

This works well when at least one of the datasets is small (e.g. 10 rows or less), and is the most performant join operator in that case. However, when either the source tables get larger or the destination tables get larger, we’ve noticed the query optimizer fails to change to using Merge Join for that final merge operation, which is the preferable operator when dealing with two larger datasets. We were occasionally able to get this to happen by changing around indexes on the table variables, but we’d consistently find the query plan going back to Nested Loops. There are a few possible reasons for this:

The query optimizer assumes that the table variables coming in to the procedure will normally be small. This is probably usually a fair assumption.
Merge Join is more expensive when one of the result sets is in fact small, so Nested Loops will be preferable for most cases.
It may become particularly unpredictable when joining the result set within the USING portion of the merge statement – the optimizer may incorrectly assume this is going to result in fewer rows.
Merge Join will require two additional sort operations – however, this sort will be cheap since the sort is actually happening in relation to existing clustered indexes (so the data is already sorted)

Whatever the final reasons, testing showed that Merge Joins were vastly better here. There were two attempts we had; first, we tried just using the MERGE hint for the source table, but this would result in Nested Loops on the destination table; finally, we ended up an OPTION (MERGE JOIN); for the whole merge statement, forcing the engine to use Merge Join over Nested Loops to the final table and the intermediate source table:

MERGE TargetTableName target
USING (SELECT * FROM @sourceTable src INNER JOIN @trackerTable tracker ON src.Pk = tracker.Pk)  as remapTable
ON 1=0 -- force insert
WHEN NOT MATCHED
THEN
INSERT (col1, col2, col3, etc...)
VALUES (col1, col2, col3, etc...)
OUTPUT INSERTED.pk_id, remapTable.old_id into @nextTrackerTable
OPTION (MERGE JOIN);

Which results in a query plan like this:

Using the OPTION (MERGE JOIN); does the trick here, and renders the more favorable query plan for these scenarios. Using this technique showed no noticeable change in smaller loads (where most input tables had fewer than 10 rows); but for larger loads (even up to hundreds of thousands of rows in children), we saw data loads go from 10+ minutes down to >1 minute.

↧

Dynamically set InstanceContextMode for a WCF Service from app.config or web.config

June 28, 2016, 8:01 am

≫ Next: Considerations About HL7 Integration

≪ Previous: Improving performance of inserting multiple parent/child tables in a single SQL Procedure

WCF offers a lot of very powerful configuration and extensibility options – sometimes it becomes a bit dizzying.

I recently had a requirement to design a WCF service that could potentially consume many system resources (particularly, RAM) in some client scenarios. Any single call will be manageable, and concurrent calls would be manageable in some environments (but not others – the service is required to be able to process large files in a single call, but can only handle so much before running out of memory). Obviously, it would be good to find ways to split up calls to the service or for the client to sequence them, but WCF offers a very simple configuration attribute to specify that the service should run as a Singleton, processing only one call at a time:

[ServiceBehavior(InstanceContextMode = InstanceContextMode.Single)]
public class LargeFileProcessingService : ILargeFileProcessingService
{
    ...
}

Problem solved, right? Now what if a particular environment will only be handling smaller files, and needs to process them quickly? We lose the multi-threaded processing power that WCF would be able to offer there. Unfortunately, InstanceContextMode is not exposed out of the box as a property that can be changed in app.config or web.config. However, a couple classes can allow us to do just that. This class will allow you to deploy your services and configure them to either be running in a “single threaded” mode (queuing subsequent requests until the current request is processed or they time out) or “multi-threaded” (allowing multiple instances to run concurrently). This works much like BizTalk’s “ordered delivery” option on Send Ports, and like that option it can be configured at any time and initialized in the instance by restarting it.

First, an implementer of IServiceBehavior:

namespace Tallan.WCF
{
    public class InstanceContextServiceBehavior : IServiceBehavior
    {
        InstanceContextMode _contextMode = default(InstanceContextMode);

        public InstanceContextServiceBehavior(string contextMode)
        {
            if (!string.IsNullOrWhiteSpace(contextMode))
            {
                InstanceContextMode mode;

                if (Enum.TryParse(contextMode, true, out mode))
                {
                    _contextMode = mode;
                }
                else
                {
                    throw new ArgumentException($"'{contextMode}' Could not be parsed as a valid InstanceContextMode; allowed values are 'PerSession', 'PerCall', 'Single'", "contextMode");
                }
            }
        }

        public void AddBindingParameters(ServiceDescription serviceDescription, ServiceHostBase serviceHostBase, Collection<ServiceEndpoint> endpoints, BindingParameterCollection bindingParameters)
        {
            var behavior = serviceDescription.Behaviors.Find<ServiceBehaviorAttribute>();
            behavior.InstanceContextMode = _contextMode;
        }

        public void ApplyDispatchBehavior(ServiceDescription serviceDescription, ServiceHostBase serviceHostBase)
        {
            return;
        }

        public void Validate(ServiceDescription serviceDescription, ServiceHostBase serviceHostBase)
        {
            return;
        }
    }
}

And then the BehaviorExtensionElement implementation (so we can set this from web or app.config):

namespace Tallan.WCF
{
    public class InstanceContextExtensionElement : BehaviorExtensionElement
    {
        public override Type BehaviorType
        {
            get
            {
                return typeof(InstanceContextServiceBehavior);
            }
        }

        protected override object CreateBehavior()
        {
            return new InstanceContextServiceBehavior(ContextMode);
        }

        const object contextMode = null;

        [ConfigurationProperty(nameof(contextMode))]
        public string ContextMode
        {
            get
            {
                return (string)base[nameof(contextMode)];
            }
            set
            {
                base[nameof(contextMode)] = value;
            }
        }
    }
}

And viola! You can now add this to your app.config:

<system.serviceModel>
    <services>
      <service name="Tallan.WCF.LargeFileProcessingService" behaviorConfiguration="Default">
        <endpoint address="" behaviorConfiguration="webBehavior" binding="webHttpBinding" bindingConfiguration="largeWebHttpBinding" contract="TctepConnector.WCF.IX12CoreProcessingService" />
      </service>
    </services>
    <bindings>
      <webHttpBinding>
        <!-- allow large requests, set generous timeouts -->
        <binding name="largeWebHttpBinding"
                 closeTimeout="01:00:00"
                 openTimeout="01:00:00"
                 receiveTimeout="01:00:00"
                 sendTimeout="01:00:00"
                 allowCookies="false"
                 bypassProxyOnLocal="false"
                 hostNameComparisonMode="StrongWildcard"
                 maxBufferSize="2147483647"
                 maxBufferPoolSize="2147483647"
                 maxReceivedMessageSize="2147483647"
                 transferMode="Streamed"
                 useDefaultWebProxy="true">
          <readerQuotas maxDepth="32" maxStringContentLength="524288" maxArrayLength="16384" maxBytesPerRead="4096" maxNameTableCharCount="16384" />
          <!--<security mode="Transport" />-->
        </binding>
      </webHttpBinding>
    </bindings>
    <extensions>
      <behaviorExtensions>
        <add name="instanceContext" type="Tallan.WCF.InstanceContextExtensionElement, Tallan.WCF, Version=1.0.0.0, Culture=neutral, PublicKeyToken=KEY TOKEN HERE"/>
      </behaviorExtensions>
    </extensions>
    ...
    <behaviors>
      <serviceBehaviors>
        <behavior name="Default">
          <!-- valid values for contextMode are: "Single", "PerCall", "PerSession" -->
          <!-- This will override any declared attributes on the service -->
          <instanceContext contextMode="Single"/>
          <serviceMetadata httpGetEnabled="True" httpsGetEnabled="True"/>
          <!-- To receive exception details in faults for debugging purposes,
          set the value below to true.  Set to false before deployment
          to avoid disclosing exception information -->
          <serviceDebug includeExceptionDetailInFaults="False" />
        </behavior>
      </serviceBehaviors>
    </behaviors>
    ...
</system.serviceModel>

Obviously, this kind of service might not work very well over the internet, but could offer a lot of interoperability over an intranet while still allowing for multiple configuration options with regard to singleton or not.

↧

Considerations About HL7 Integration

August 16, 2016, 8:36 am

≪ Previous: Dynamically set InstanceContextMode for a WCF Service from app.config or web.config

On the provider side of healthcare integration, HL7 (particularly v2) is a critical message type to understand. While it is standardized and heavily used by various EHRs/EMRs, it’s used in slightly different ways. There are efforts to further standardize and normalize its use across the board (such as with v3/FHIR), many EHRs and EMRs continue to use 2.x messages. Common HL7 messages include admissions/transfer/discharge (ADT), scheduling (SIU), lab orders and results (ORU, ORM), and medical reports (MDM). Choosing the right platform can be challenging.

Some of the challenges of HL7 2.x messages include:

The ability to add non-standard custom segments or additional data anywhere in the message (whether they are completely custom “Z Segments” or other segments that aren’t typically part of the message, such as IN1 segments in an ADT message to include additional insurance information).
A myriad of parsing and manipulation libraries available
Questions about whether to go with a specialized HL7 platform or a full-fledged integration platform
The wide diversity of applications/systems/endpoints to integrate with
The need to correlate and understand other healthcare data, such as CCD and X12 EDI

These points can be captured in two key issues: The HL7 standard and parsing, and Specialized solutions vs Integration platforms.

HL7 2.x and parsing

HL7 v2 messages look a little bit like EDI files: flat text files with delimited fields and segments. Unlike EDI, the delimiters are all contained within the first few characters of the message, and the segment terminator is specified by the standard as a carraige return (‘\r’). Typically, a ‘|’ is used as a field delimiter, a ‘^’ is used as a component (field part) separator, an ‘&’ is used as a sub-component separator, and a ‘\’ is used as an escape character.

HL7 messages can have repeating groups of segments, and it’s possible to have those groups nested within each other (i.e. a grouped section of segments that can repeat that itself contains a grouped section of segments that can’t repeat). In addition to that, the specification itself declares that parsers must be prepared to ignore unexpected segments that applications may insert, whether they be custom Z Segments or HL7 segments that are typically found in a different transaction. Some applications will put these segments at the end of the message to make the parser’s job a bit easier, but some will put them interspersed throughout the message.

Integration platforms, such as Microsoft BizTalk, MuleSoft AnyPoint, and Dell Boomi, offer support for HL7 parsing. There are also stand-alone HL7 parsing libraries available, such as the HAPI project for Java, a .NET port of that project called NHAPI, and HL7apy for Python. Some of these integrate with the HL7 XSD schemas developed by Sun and distributed by HL7.org, others use their own schemas. There are several potential pitfalls to be aware of with these options:

The Sun schemas are very useful for doing things like XSD validation or XSLT based mapping (such as with the BizTalk mapper), but make heavy use of <xsd:any> tags. This means that validation may be more permissive than expected or than a destination system expects.
Parsers offer varying levels of support for the Sun XSDs – HAPI supports encoding them, HL7apy doesn’t appear to. BizTalk supports them, but doesn’t support encoding or decoding from HL7 to XML in that format out of the box (it uses a different proprietary XML format to support those operations), however, Tallan offers a solution accelerator to implement this (more on that later).
Most libraries can support MLLP in some way, shape or form. Integration platforms will offer full-fledged support for multiple protocols and advanced options regarding transport types and transformation support.
Enterprise grade support – open source projects and communities will lend themselves more to developers and consultants doing your support work, whereas commercial solutions will lend themselves more to assurances from the vendor.

Specialized solutions vs. Integration Platforms

As with other healthcare data, there are specialized vendors who offer HL7 message translation solutions (sometimes built off of the open source libraries mentioned above). These solutions can be tempting – they tend to offer well-thought-out translators for HL7 application needs and may have some pre-built packages for various EHRs. At the same time, they can be a trap. Any specialized integration tool will end up falling short at some point (such as when you need to start integration EDI claims or payment advice messages, or when you have one vendor or system giving you a non-HL7 XML, JSON, or flat file message that need to be parsed and translated to HL7 or vice versa). They may or may not offer support outside of the MLLP and/or SFTP protocols, limiting options for delivery and retrieval as new systems come on board such as Web Services or delivery over HTTP/HTTPS.

Most crucially, it is essential to make use of canonical messaging patterns when dealing with HL7 integrations. New lab partners will expect your organization to be able to on-board within a couple weeks. A point to point integration for each new partner can easily lead to a mess of spaghetti code – while it may be easier up front, it will eventually become unmanageable and exponentially increase in risk as time goes on. Simple code changes become very challenging to deploy and track, and each new integration point requires starting from scratch. Integration platforms lend themselves well to these patterns, and leave control in the consumer’s hands. Use of canonical messaging patterns makes on-boarding new trading partners or applications faster, as you can know what to expect from your organization’s canonical format and implement the feed-specific translation in a way that doesn’t disrupt other feeds.

Finally, integration platforms free you from the whims of the specialized vendor. Large, well-established platform solutions like BizTalk, Mule, and Boomi aren’t going anywhere, and offer loads of opportunity for any integration or reporting need your organization will have – most critically, unlocking your valuable data for use in BI reports and custom correlations with other integration data.

One challenge that can come up with the platforms is that they may lack some of the niceties of an HL7-specific platform – for example, BizTalk has some gaps between its built-in support for HL7 and the Sun XML schemas that are more widely understood (but offers excellent support for EDI), whereas Mule (which uses HAPI internally) offers pretty good parsing capabilities and XML support, but the translation options are not quite as solidified as BizTalk at this point.

The Tallan Solution

Tallan addresses these shortcomings with the T-Connect HL7 Accelerator. Developed from similar principals to our EDI Accelerator products, the HL7 accelerator adds several important capabilities to integration platforms:

Fast, powerful message parsing that is compliant with standards and situations seen “in the wild”
Out of the box patterns and practices such as canonical messaging and dynamic routing capabilities
Message persistence in relational tables for data mining and correlation
Full capabilities and support of the backing integration platform, and full compatibility with the T-Connect EDI Accelerator product for EDI processing

These capabilities help ensure you can unlock the full capabilities of your EMR/EHR systems in a maintainable, flexible environment that keeps the care providers in control of care. Interested in learning more about Tallan’s offering? Contact us today!

↧