Showing posts with label EDI. Show all posts
Showing posts with label EDI. Show all posts

EDI, RPC, SOAP, MQ, REST and Interoperability

All of these concepts help to address the same concern: how do we move data from System A to System B when these systems have no direct linkage (no common data store)? The following are a few of the technologies that have served as answers to this question.


There was a different kind of web back in the way day


EDI (Electronic Data Interchange): An exchange of data usually large in volume in comparison to other remote data transfer methods (batched records of 1000s vs. 1 record of JSON or 1 row of an RDBMS table), and usually done in conjunction with some kind of an ETL and/or Data Warehousing process. EDI is typically used for large, domain-specific transactions and the data transfer itself is performed over SFTP or another secure file transfer protocol and utilizes XSLT for data formatting. EDI files must adhere to strict ISO formatting specifications. This is helpful (and coincidentally adds a layer of complexity for hackers) when trying to ensure that a large number of disparate parties reporting data are all sending data in the right format as, if an EDI file's data format is wrong in any way, it won't be accepted at the destination.


This is an example of an EDI "EDIFACT" formatted file


RPC (Remote Procedure Call): Highly-coupled abstraction (if you can call it abstraction- it's really more of a video game accessory that only works on certain consoles) that essentially requires the client and server to be running the same program which, while once upon a time was feasible (and in some cases may be desirable for channel security), is not typically the ideal way to communicate openly. However, for closed, secure communications, RPC is still very much a part of the many technologies that facilitate secure messaging in applications like Telegram, Signal and the like.


As stated, RPC implies client-server sharing code (see "RPC thread" spanning above)


SOAP (Simple Object Access Protocol): This had been the standard for web services (indeed it is why Microsoft created WCF) until HTTP-based/RESTful APIs replaced them as the standard choice among developers of newer projects around the early 2010's. It is self-describing (.wsdl) and allows for communication over virtually any point-to-point communications protocol. SOAP is however quite prescriptive in the way it dictates how SOAP message "objects" are defined, leading to a lot of (interface) metadata inside the envelope that may have little to do with the task at hand but which is needed so that the client can understand the message and deserialize the object if necessary.


An example of a faulting SOAP call's SOAP response


MQ (Message Queuing): the primary concept of utilizing message queues and exchanges is the asynchronous nature in which the messages are pulled and pushed vs. a REST or SOAP service call which are request/response synchronous by design.

This architectural data model also supports highly-decoupled design whereby many applications, all written in different languages and under disparate frameworks can utilize the same MQ Exchange and share communication across queues.

Frameworks like RabbitMQ facilitate event sourcing design with queues; an app is often both a Producer and Consumer



REST (REpresentational State Transfer) APIs: Operating completely (and solely) over HTTP(S) and via (primarily) GET/POST actions which have already undergone some 35 years of incremental improvement, for as long as the web lives on, REST APIs will be at its foundation. They aren't self-describing though descriptive metadata can be embedded in the naming of the API resources to achieve similar reflection. Additionally, there are usually descriptive, interactive specifications for large publicly hosted APIs like the ones from Google Maps and Twitter. RESTful APIs are not highly prescriptive in the API structure/operations. It just has to be an HTTP action method that any HTTP client would understand. Most APIs default to passing JSON around when objects are involved in POST arguments or GET return values; but there is no reason you cannot return XML. Or a file. Or a streaming video. Or whatever floats your software ship. People create RESTful API wrappers for SOAP services all the time.



Just leaving the REST for last.. 😉


Although many in the software development community prefer the use of RESTful APIs, Message Queuing or some combination of the 2 for new projects, we must be mindful of the fairly recent past which has littered the landscape with SOAP, EDI, ETL and an assortment of proprietary and highly customized RPC (still active) communication channels (for example SOAP streaming over UDP).

There was a time before the web as we know it today when machines like ATMs and TicketMaster were still interconnected just as ever. However these connections were not regular TCP HTTP packets traveling to and from port 80 or 443 but rather fixed length TCP frames of ATM or another early file transfer protocol. And many of those ATM and TicketMaster connections still exist, even if upgraded for modern times via something like WCF (.NET) or JAX-WS (Java).

There are certain things only SOAP can do. There are certain legacy systems which will not be updated any time soon (because "if it ain't broke") that still need to interface with SOAP clients. As technologists we have to deal with this and understand the tradeoffs of using different frameworks for different jobs. 

In the same way that there is no perfect language for every scenario, no one way of electronically transferring data and interacting with remote systems is always always the "best way" (although REST APIs come pretty close as so much our connected world is now http-based).

The best choice for sending remote communications just like any choice of framework, language or design paradigm is never fixed. The answer requires careful, domain-centric, thorough analysis of the problem and the resources available to resolve that problem. In software development, the answer to "which way is the best way?" is invariably- "it depends".





ETL and EDI Using SSIS

ETL is the process by which you can take (Extract) data from various (usually related) data sources, Transform that data to meet your destination system's needs, and finally Load that transformed data into the destination system data store.

Your table structure will be something along the lines of this basic template:

In a real-world db environment Staging, OLAP, OLTP and other data repos may be on different database servers, this is same db server for demonstration

We will use SQL Server Integration Services (SSIS) and develop the SSIS package within the Visual Studio 2017 IDE.


The first step of the SSIS package load (INSERT) the data into a STAGING area database. This allows us to:
  • Store off the intermediate data from all sources into analysis-friendly OLAP datastores
  • Perform data integrity checks
  • Keep extraction and transformation as two strictly separated steps

We load the data from the various source files (.csv, .xls, .xlsx) into SQL Server database Staging table(s) using SSIS Source and Destination Data Flow Tasks connected with the movable data flow arrows. Once you have connected a source and destination you can go into the Destination Data Flow Task and edit the mappings of which source columns should be written to which destination columns.

Next we perform some transformations. This can be anything from a simple ranking or status/flagging/business prioritization algorithm to data cleansing to data partitioning based on certain criteria; the key is that this Transform step is where we apply T-SQL UPDATEs to transform the data once it has all been aggregated in Staging.

Then we refresh the OLAP destination tables using the same kind of Source and Destination Data Flow Tasks and mappings as used for Staging. The OLAP data is used for data analysis.

Finally, we load the cleansed Staging data into our destination system's OLTP database and email or text message the system owner upon successful completion of the SSIS ETL job (or deliver an error if anything fails). The OLTP data stores live transactions.

Bear in mind that most ETL data-flow step mappings are not a 1:1 match; this is just an e2e demo of SSIS ETL in most basic form


Happy ETL'ing, and be sure to watch out for cases of mysterious symbols/characters from miscellaneous data copied from other programs or from other system environments that were using a Language setting (codepage) which is incompatible with your ETL software. Bad data happens more than you think and as we say, GIGO.

Your end result looks like this (all Green Checkmarks indicates all was successful; I recommend using PaperCut for SMTP testing- super cool and useful product

I would attach or GitHub the source code (and will do so upon request) but SSIS project code has a lot of dependencies and can get quite messy for another to re-use project on even just a 'slightly different' machine.

Having used SSIS' now-deprecated predecessor "DTS" (SQL Server Data Transformation Services) and SSIS for many years I can attest to the fact that the best way to learn this product is by diving right in on your own and begin the creation of sources and destinations and source/destination connection managers and control flow events, and .NET integration, and exception event handlers, etc.

You will likely run into some ambiguous and not well-documented errors when developing in SSIS; but persist in your efforts and you can create a very powerful EDI system with the many capabilities of SSIS and the robust Scheduled ETL jobs that it can create.

T-SQL and .NET for XSLT

Steps to perform transformation from SQL to XML to XSLT Output data (csv, htm, tab-delim, fixed-width, etc.):

1). Get your SQL data (assuming you are using SQL Server) into XML format via:

 SELECT [OrderId], [PersonId], [OrderDateTime] FROM [Order] FOR XML AUTO, ELEMENTS, ROOT ('Orders');  


XML Result of "FOR XML AUTO, ELEMENTS, ROOT"..

2). Load XSL transform file (OrderXform.xsl) into a new XslCompiledTransform object (transform):

       XslCompiledTransform transform = new XslCompiledTransform();  
       transform.Load("C:\\Xform\\OrderXform.xsl");  


XSL file used to transform the XML to the desired file output (EDI 834 standard, etc)


3). Using a StringWriter (writer), StringReader (inputReader) and XmlReader (inputXmlReader), read the inputReader XML string into inputXmlReader and apply the XSL transformation to the transform object via: transform.Transform(inputXmlReader, args, writer); //(or another override of Transform())

  using (StringWriter writer = new StringWriter())   
     {   
      // Make an XML reader (inputReader) out of the string (input)  
      XmlReader inputXmlReader;   
      using (var inputReader = new StringReader(input))   
      {   
       inputXmlReader = XmlReader.Create(inputReader);   
       // Apply the transformation to XmlReader (inputXmlReader) and write it to StringWriter (writer)   
       transform.Transform(inputXmlReader, null, writer);   
      }      
      // Retrieve the output as string from writer object  
      return writer.GetStringBuilder().ToString();   
     }   

4). Return the result and write the bits (to memory as Label Text for a demo like this, to file, to domain network location, to non-domain customer ftp, etc.).

The result normally gets written to remote SFTP location or posted to API endpoint for processing

Nothing incredibly complex about this, but it seems to frustrate a lot of people in .NET land so I have documented a quick and easy example.

GitHub Source: https://github.com/Radagast27/Simpl_XSLT_dot_net

Tool for XSLT validation: https://www.freeformatter.com/xsl-transformer.html

References:

https://www.red-gate.com/simple-talk/sql/learn-sql-server/using-the-for-xml-clause-to-return-query-results-as-xml/

https://www.w3schools.com/xml/xsl_examples.asp

https://www.c-sharpcorner.com/article/transform-xml-output-of-sql-query-using-for-xml-auto-statement-to-html-using-xsl/

SQL, XML, and XSLT with SSIS

A common need in any business is the exchange of data through various file formats. EDI (Electronic Data Interchange) is typically defined in a document that outlines the positions, symbols, and range of acceptable values for a given file standard (834, 872, etc.).

Example of an 834 EDI formatted file

In this exercise, we are going to very quickly turn a SQL dataset result into an XML file and apply an XSL transformation to it- all through SSIS. Using the techniques described below you will be able to create a process to transform data and write data files for any EDI standard with relative ease.

One can add .NET and other customizable code via SSIS "Execute Process" and "Script Task", however in this exercise we are going to stick to the core problem of fetching data from a SQL Server, transforming that data and then writing the result to an output/destination file.

The SSIS package design

To start, create an Integration Services project in Visual Studio 2017 (install SQL Server Data Tools if your machine does not have the Integration Services Visual Studio project template).

Our project will consist of a Data Flow Task to gather our data (SQL Server to a text file containing the results as XML), and an XML Task which will apply an XSL transformation (XSLT) and write the file to disk.

Data Flow Step #1 -  Source


Data Flow Step #1 - Destination

In my example here, we are using ADO.NET to connect to SQL Server for source data as XML and then writing that data to result.txt.

result.txt (formatted to more clearly see the Customer XML elements

Next, the SSIS XML Task takes result.txt and applies an XSL stylesheet transformation to it, thereby changing the data from XML to HTML with dynamically added CustomerID values (the integer after the static text, "***CUST**ID**").

The key to the particular (CustomerID) transformation we are doing here is the following code which selects each <Customer> in the XML and writes the <CustomerID> value after "CUST**ID**".

<xsl:for-each select="Root/Customer">
 <tr>
  <td>NM1*IL*1***CUST**ID**<xsl:value-of select="CustomerID"/>****MI*YYX123456789!</td>
 </tr>
</xsl:for-each>

XML Task

XSLT is the operation, result.txt is our source data, finally.htm our destination and the XSL is Direct Input shown in the highlighted text above.

You can easily send mail on success or failure to keep the appropriate personnel informed of ETL job/service status. You can use a connected SMTP server to send mail via the SSIS Send Mail Task or (as I have in this example) you can use this T-SQL for sending mail as an Execute SQL Task:

T-SQL for sending email in Execute SQL Task

If you need to create a SQL Server Database Mail account/profile for Gmail, etc, follow this walkthrough.

And with a little luck and no missed steps, you will execute the SSIS package successfully and can go on to really leverage the powerful capabilities of SSIS to build simple and effective solutions for data interchange.

The completed SSIS package and the .htm result of our ETL exercise




Where can we find the data?

There are myriad resources to get data from a variety of domains (insurance, population, health, geography, weather, sports, etc.). You can get the data in .csv form or through API calls.

Here are a few to jump-start your data-oriented application:

Financial: https://fred.stlouisfed.org/search?st=csv

Baseball: http://www.baseball-databank.org/

Soccer: https://www.soccer24.com/team/apia-tigers/84SKUunL/

Government: https://www.data.gov/

Public: https://github.com/awesomedata/awesome-public-datasets

GIS: https://gisgeography.com/best-free-gis-data-sources-raster-vector/

Google Maps API: https://developers.google.com/maps/documentation/directions/


Reference: https://www.quora.com/What-is-the-best-financial-data-source-in-CSV-file-format