ETL and the Case Against a Single System for All Business Processes

Nearly every business wants to centralize their business data so that they can get an at-a-glance overview of what is going on at any given point in the day. Suppose, for instance, a company has its business data on 5 separate database servers, in various databases therein, across several different tables. And much of the data will need to be slightly modified (transformed) in order to make data types consistent and optimize it for quick and meaningful data analysis.

Assume the database servers are:
  • CMS, Orders (Dynamics 365 data)
  • Accounting (Sage 50 data)
  • Warehouse, Logistics (SAP data)
  • Field Service, Product Support (SalesForce data)
  • Call Center (TantaComm data)
Furthermore, the business wants to apply some custom logic to various fields so that certain calculations don't have to be made by the OLAP end users and applications.

ETL is just moving data from a Source to a Destination, often making some changes (transformations) along the way

Well, it all seems so complicated, right? How can all that disparate data be combined and standardized? It's not as complicated as allowing an ERP solution define how your various departments do their job. Some ERP products truly do have the functionality to satisfy 80-85% of the company business requirements. And when that is the case, by all means, companies should go that route.

But for instances where ERP of every business function is not working, you can use products like SSIS, Informatica, and Oracle Data Integrator to easily map ETL data visually and apply code operations to the data using a variety of programming languages.

But of course, there is no "one-size-fits-all 100% of our business needs" ERP solution; different business productivity software work better or worse for different industries, different departmental needs, for all manner of different reasons. Why care about the choice of business tool that your business' management and best minds know and have proven is most effective for their- often very specific- job?

Allowing a business to integrate the "best apps for each particular type of job" is nothing more than a sort of microservices architecture. But if we are working with custom code in these various LOB applications (at least some of the LOB apps will need custom API integrations so that they can talk to each other), with each new block of custom code, comes more decentralized code bases and decentralized test scripts...

All considered, I would argue that trying to put square pegs into round holes a la "ERP-everything" is a much, much, MUCH bigger effort than a well-developed and tested scheduled ETL Data Warehouse refresh process to centralize your company's data.

It's always a balance with software. Use the most effective apps for your organization and departmental requirements, but try to avoid the dependency hell that can come along with a poorly managed distributed application environment.

Here are the ETL steps in order, from the resource link below:

Extract the data from the external data sources such as line-of-business systems, CRM systems, relational databases, web services, and anywhere you have business data you want to summarize.

Transform the data. This includes cleansing the data and converting it to an OLAP-friendly data model. The OLAP-friendly data model traditionally consists of dimension and fact tables in a star or snowflake schema.

Load the data so that it can be quickly accessed by querying tools such as reports, dashboards, etc.

"90% of the requirements now is better than 100% of the requirements never" -Unknown

Reference: https://blogs.msdn.microsoft.com/andreasderuiter/2012/12/05/designing-an-etl-process-with-ssis-two-approaches-to-extracting-and-transforming-data/

No comments: