kpitsimpl: December 2020

EDI, RPC, SOAP, MQ, REST and Interoperability

All of these concepts help to address the same concern: how do we move data from System A to System B when these systems have no direct linkage (no common data store)? The following are a few of the technologies that have served as answers to this question.

There was a different kind of web back in the way day

EDI (Electronic Data Interchange): An exchange of data usually large in volume in comparison to other remote data transfer methods (batched records of 1000s vs. 1 record of JSON or 1 row of an RDBMS table), and usually done in conjunction with some kind of an ETL and/or Data Warehousing process. EDI is typically used for large, domain-specific transactions and the data transfer itself is performed over SFTP or another secure file transfer protocol and utilizes XSLT for data formatting. EDI files must adhere to strict ISO formatting specifications. This is helpful (and coincidentally adds a layer of complexity for hackers) when trying to ensure that a large number of disparate parties reporting data are all sending data in the right format as, if an EDI file's data format is wrong in any way, it won't be accepted at the destination.

This is an example of an EDI "EDIFACT" formatted file

RPC (Remote Procedure Call): Highly-coupled abstraction (if you can call it abstraction- it's really more of a video game accessory that only works on certain consoles) that essentially requires the client and server to be running the same program which, while once upon a time was feasible (and in some cases may be desirable for channel security), is not typically the ideal way to communicate openly. However, for closed, secure communications, RPC is still very much a part of the many technologies that facilitate secure messaging in applications like Telegram, Signal and the like.

As stated, RPC implies client-server sharing code (see "RPC thread" spanning above)

SOAP (Simple Object Access Protocol): This had been the standard for web services (indeed it is why Microsoft created WCF) until HTTP-based/RESTful APIs replaced them as the standard choice among developers of newer projects around the early 2010's. It is self-describing (.wsdl) and allows for communication over virtually any point-to-point communications protocol. SOAP is however quite prescriptive in the way it dictates how SOAP message "objects" are defined, leading to a lot of (interface) metadata inside the envelope that may have little to do with the task at hand, but which is needed so that the client can understand the message and deserialize the object if necessary.

An example of a faulting SOAP call's SOAP response

MQ (Message Queuing): the primary concept of utilizing message queues and exchanges is the asynchronous nature in which the messages are pulled and pushed vs. a REST or SOAP service call which are request/response synchronous by design.

This architectural data model also supports highly decoupled design whereby many applications, all written in different languages and under disparate frameworks can utilize the same MQ Exchange and share communication across queues.

Frameworks like RabbitMQ facilitate event sourcing design with queues; an app is often both a Producer and Consumer

REST (REpresentational State Transfer) APIs: Operating completely (and solely) over HTTP(S) and via (primarily) GET/POST actions which have already undergone some 35 years of incremental improvement, for as long as the web lives on, REST APIs will be at its foundation. They aren't self-describing though descriptive metadata can be embedded in the naming of the API resources to achieve similar reflection. Additionally, there are usually descriptive, interactive specifications for large publicly hosted APIs like the ones from Google Maps and Twitter. RESTful APIs are not highly prescriptive in the API structure/operations. It just has to be an HTTP action method that any HTTP client would understand. Most APIs default to passing JSON around when objects are involved in POST arguments or GET return values; but there is no reason you cannot return XML. Or a file. Or a streaming video. Or whatever floats your software ship. People create RESTful API wrappers for SOAP services all the time.

Just leaving the REST for last.. 😉

Although many in the software development community prefer the use of RESTful APIs, Message Queuing or some combination of the 2 for new projects, we must be mindful of the fairly recent past which has littered the landscape with SOAP, EDI, ETL and an assortment of proprietary and highly customized RPC (still active) communication channels (for example SOAP streaming over UDP).

There was a time before the web as we know it today when machines like ATMs and TicketMaster were still interconnected just as ever. However, these connections were not regular TCP HTTP packets traveling to and from port 80 or 443 but rather fixed length TCP frames of an earlier file transfer protocol. And many of those ATM and TicketMaster connections still exist, even if upgraded for modern times via something like WCF (.NET) or JAX-WS (Java).

There are certain things only SOAP can do. There are certain legacy systems which will not be updated any time soon (because "if it ain't broke") that still need to interface with SOAP clients. As technologists we have to deal with this and understand the tradeoffs of using different frameworks for different jobs.

In the same way that there is no perfect language for every scenario, no one way of electronically transferring data and interacting with remote systems is always the "best way" (although REST APIs come pretty close as so much our connected world is now HTTP-based).

The best choice for sending remote communications just like any choice of framework, language or design paradigm is never fixed. The answer requires careful, domain-centric, thorough analysis of the problem and the resources available to resolve that problem. In software development, the answer to "which way is the best way?" is invariably- "it depends".

Reference: https://researchportal.port.ac.uk/portal/files/681058/ITIT_13_1035_1.pdf

"Next Big" Software Religiosity and The Go-nowhere Rush

There is far too much religious extremism in information technology these days. And there have always been camps (extreme anti-Microsoft sentiment or its sad corporate counterpart: disdain, fear and suspicion of all things open-source)- but these days it has gotten to the point where sensible, cheap, reliable, proven solutions that everyone on the team understands- are thrown out in favor of chasing the next big thing that some bigshot at some big conference declared was going to be the next, next, next "big thing".

This image does have its merits..

Amid all the continuous rush to be cutting edge despite understanding what that edge can do for you and having a strong data foundation to build upon with that new cutting edge thing- it doesn't matter what tools are out. You are still stuck with ideas and not programs.

Design and develop with what works for your particular team and project and within the context of the environments of your stakeholders (if all but 2% of your customers use Android then the iPhone version of your app may not be as important as you think). Above all else, make sure you understand the domain knowledge behind the data your application will be persisting and passing around. That (the data understanding) is the heart of every program that stores, processes, transmits or even simply reads/prints/paints- any kind of communication.

Data sense-making and software development is hard work. And it's not done in a void. I suggest reading Stephen Few's "Big Data, Big Dupe" which is a little paperback containing 90 some pages of important wisdom for this modern rapid-fire information age that pre-empts knowledge of data in favor of slogans and metrics about data.

In short, the essence of this book is that if you have say 10TB of crap data that is always causing ETL failures that your personnel spend countless hours trying to correct... you may indeed have "big data" per some misguided tech journalist's definition... but you still have crap data-- understand your data before you try understanding how best to fit it inside of the newest shiny box.

Take also for example message queues and their usage in modern web application development. There seems to be a lot of misunderstanding about what MQ is and even some who claim this is a new technology (MSMQ has been around since Windows '95; IBM MQ has been in use since 1993). Basic email has operated on a publisher/subscriber (ICMP or SMTP) messaging queue paradigm that works in much the same way as modern MQ implementations (minus some bells and whistles)- since the early 70's.

These things aren't as complicated as they seem but they are complicated. And it's perilous to keep jumping from new trick to new trick whilst ignoring foundational, timeless software principles.

I would go so far as to say it is injurious to current and future generations of software developers to keep focusing on buzzwords, zooming out and away from the hard-but-necessary work of understanding the data, and then wondering why the tool or framework flavor of the year did not save the day.