Showing posts with label SQL. Show all posts
Showing posts with label SQL. Show all posts

extRS Portal: a modern SSRS client



ExtRS Portal provides a blueprint for extending the functionality of Reporting Services 


 
extRS (pronounced, "extras") is a modern SSRS client for distributing and reading reports; with some extras. A demo of the app is linked here: https://extrs.net
   
The audience is SSRS report users (you know, the people you need to justify having enterprise reporting in the first place). So things like applying item-level RS security, managing users, and adding, editing and deleting SSRS catalog items and other system-level properties are not part of this client- at least not yet.

The aim here is to make SSRS at least slightly more interesting, accessible and useable for information consumers. This particuliar deployment of the extRS.Portal web client is connected to a report server with custom authentication (extRSAuth) which gets passed the normally required "Windows authentication" hamstring of the default SSRS installation. 

This wrapper and extension UI not only improve the user authentication experience and dynamism of SSRS parameter behaviors in the UI but also provide SSRS admins and other users with rich enterprise reporting usage and delivery data.

Enabled are the most of the features contained in Reporting Service's built-in Report Portal at /reports.

I have disabled some things like deleting and uploading items for the sake of keeping my demo of the app small and simple.

The source code can be found here: https://github.com/sonrai-LLC/extRS



An SSRS IFrame/CORS infinite redirect loop error and a quick and easy solution

The redirect loop looks like this and, in Edge, will display the error message: "[domain] redirected you too many times"

If you are trying to render the SSRS ReportViewer control within an <iframe>, you may run into a CORS issue that manifests in a series of 302 (Found) responses and an infinite redirect loop between ReportViewer control (ReportViewer.aspx) and Logon.aspx.

As of SSRS 2022, without an explicit instruction to allow CORS, ReportViewer cannot be rendered within an <iframe> on an origin different than the origin of the report server.

If you are using custom authentication, the solution is easy enough. Just add cookieSameSite="None" and enableCrossAppRedirects="true" to the authentication <forms> tag in the report server's web.config.

 <authentication mode="Forms">  
   <forms loginUrl="logon.aspx" name="sqlAuthCookie" cookieSameSite="None" timeout="60" path="/" enableCrossAppRedirects="true" requireSSL="true">  
   </forms>   
 </authentication>  

You may also need to enable CORS in your client app. In ASP.NET Core 8, this can be achieved through the following in your application startup code:
 app.UseCors(builder => builder  
 .WithOrigins("https://localhost", "https://[domain]")  
 .AllowAnyMethod()  
 .AllowAnyHeader());  

Power BI and SSRS - A complimentary symbiosis

 

"Generally, Power BI paginated reports (SSRS reports) are optimized for printing, or PDF generation. Power BI reports are optimized for exploration and interactivity."

 

So, what really IS up with MSBI these days? Is SSRS getting shuttered? PBI has paginated RS-like reports, but not a lot of the other features SSRS provides. Microsoft marketing will continue to hate people like me who would go to great lengths to help keep alive an aging reporting technology. But the thing is- SSRS simply does the job for the vast majority of reporting use cases. In the last 18-20 years, there have been no major advancements in scheduling and snapshotting and otherwise caching and managing and distributing electronic information. And SSRS has all of that built in.


Similar BI products- but they'll both be around for a while, with PBI eventually subsuming all the most useful SSRS tech


Nearly all of the advancements in reporting technology have come on the presentation and client-side. We can now create beautiful ad-hoc analysis and brilliantly composed interactive charts and other data presentations. But this all comes with a not insignificant price ($10 per user/month). And beyond the price, much like Azure SQL (vs. a genuine "Microsoft SQL Server" VM) and the extremely limited Azure SQL Workbench (vs. SSMS), there is a lot that Power BI cannot do well or at all.

You may have noticed the built-in SSRS reports in SSMS19's new QueryStore feature. These are very useful reports that can give DBAs an idea of how queries and being processed and what processes are consuming the most CPU. And it is a good example of a company "eating its own dog food".

I've seen SSRS installations that contained thousands of reports representing trillions of dollars of value, categorized and summarized with realtime security ownership, counterparty, price and other core trade data. Several of these business-ctitical reports had scheduled delivery and were cached and snapshotted programmatically.

Little old SSRS is a quiet but reliable business data spartan. To my surprise it is actually quite popular in the investment banking industries where stock valuations and company summaries on reports are a big part of the lifeblood that drives investment banking decision making.


A professionally developed PBI report looks and behaves more like an interactive BI dashboard- this is a good example of a good PBI report


And with a little bit of customization magic via things like ExtRSAuth, ExtRSNET48, ExtRS and other RS extension tools, SSRS and Power BI can be tailored-made to serve as a uniquely effective symbioses of print formatted, scheduled and data-driven management reports (SSRS) and ad-hoc or OLAP-based interactive data analysis charts with data visualizations (Power BI).

To answer the question of "when will SSRS be end of life?", I would say that SSRS isn't going away anytime soon. Microsoft has decided to combine SSRS and PBI (RS, .rdl reports are the "Paginated Reports" in PBI) in a way that serves both platforms. The PBI3.0 REST API indicates as much as the combined SSRS/PBI API offers a plethora of functionality that .NET developers can use to get the best of both worlds (SSRS and Power BI) and customize RS and/or PBI dashboards to support unique business processes. 


A print-formatted SSRS report- great for standardized, templated data reporting


The choice in what tool or tools you will enrich your printed reports and data visualizations with, is yours. Keep in mind that many organizations make use of both- with SSRS getting equal to more attention than PBI even to this day- not only because of the huge, global SSRS install base with all of these currently running SSRS reports- including many which support critical business and governmental processes across the globe. But also because Power BI requires a monthly subscription fee :( . Freeware seems to be slowly dying. Let's hope things change with the next version of SQL Server and maybe we'll get free* PBI. 

SQL Server 2025? Happy reporting and data analyzing. Always remember that PBI and SSRS serve different organizational needs- ad-hoc analyzation of data and pixel-perfect, professional, print-ready reports, respectively.


*(at least a free "tier"? I mean c'mon MSFT..... developers want to CREATE, and MSBI data visualization creativity is dying behind that paywall)- SSRS and PBI should be free and work hand in glove. Anything less is a mistake and a gigantic missed opportunity, imho.


Reference: https://learn.microsoft.com/en-us/power-bi/guidance/migrate-ssrs-reports-to-power-bi

Why NoSQL is Never Going to Replace SQL (apple:orange)

Do you need a jet engine to get your results from point A to B or will a modest GM sedan suffice?

I'll take the above bad analogy further and posit that while sedans and cars on the ground require stringent rules and have to navigate much more rigid structures, a jet engine simply powers the jet ahead through constraint-less skies- it's purpose is to power something big- not to be concerned with other machinery of the craft (ie. RDBMS features eschewed by NoSQL solutions).

Do you need massive global data sync scale so that millions can connect and make changes and the results all (appear) real-time? If not, NoSQL is not always the right choice and neglecting to have any kind of schema for stored application data structures can present its own host of challenges in the future if (when) those structures change. But alas, you can use NoSQL for some things (Redis, image/BLOB storage) and an RDBMS for others (more structured records and things you want to restore to a point in time in the event of a server failure).

Relational databases tend to be scaled up; NoSQL solutions are scaled out

The SQL vs NoSQL (structured and transactional vs. semi-structured and "eventually consistent") debate is not a matter of one or the other and that's that. These are complementary technologies and should both be used- wherever you find app requirements that suggest one or the other makes the most sense.

When an application is meant to scale immensely and there is not a lot of data integrity, consistency, transaction or complex data structuring and transformation needs- NoSQL is your best bet and will far outscale even the most robust RDBMS server farm- at least at a much lower cost (at the cost of sacrificing features of an RDBMS which may not be needed).

I have personally worked on several projects that utilize relational and unstructured approaches to reading and persisting application data. If you have ever used an application's config file to change a setting in JSON or XML or a simple line entry- you are seeing a small and very basic NoSQL example of storing app data.

Hadoop and other distributed NoSQL db servers are built for scalability


Using NoSQL in software development can make data structures and objects- passed to and fro from APIs and within the application itself- much more flexible to work with. 1-line to serialize an object to JSON chunk, save it to BLOB storage and forget about it.

When dealing with relational data, you really have to understand the data to write good data access code and the underlying SQL that supports well-defined structuring of complex objects.

Well-defined structuring of the persistence of complex application objects avoids data duplication/corruption, prevents breaking reference constraints and losing any sense of hierarchical data relationships and more generally lets you know very quickly when you have a problem within your data storage structures and the objects that initialize themselves from that data.

NoSQL ditches virtually all relational database data normalization rules in favor of a loosely schema'd unstructured (document, BLOB, KeyStore, etc.) data store that relies solely on keys, values and filtering unstructured metadata to get the same SELECT ... WHERE functionality found in RDBMS. Its iterations usually bear a resemblance to Java and as loosely follows here are common SQL statements and their Java or Java-derived equivalent:


"Apache Hadoop is an open source platform built on two technologies: Linux operating system and Java programming language."


For many application data requirements however, relational data can be overkill and totally unnecessary (ie. Redis to store key/vals vs. designing some elaborate key/val store in a SQL Server table).


Implementing something like Splunk or Kibana to continously index app logs and configuration files can help you dip toes into the Lake of Dark Data


The best distributed NoSQL solutions like Hadoop really shine in their inherit ability to dynamically scale to as many server machines as the operators can make ready to serve as "Hadoop processor nodes on standby".

SQL Server scaling is based more on server augmenting or "scaling up" (adding RAM, faster SSDs, RAID Arrays, etc.) rather than distributing workloads across dynamic nodes. SQL Server AlwaysOn Availability and its Mirroring and Replication feature are for recoverability and data sharing- not dynamic scaling to handle bigger and bigger workloads.

SQL has been around forever. The fundamental concept behind NoSQL (semi-structured or loosely structured data) has been around since long before SQL relational database technology. Both (along with NoSQL-related graph database paradigm) will continue to serve as viable data storage solution alternatives for many more years into the 21st century.


Relational systems like SQL Server, MySQL and Oracle usually handle structured side; NoSQL vendors are after the other 90%


In fact, SQL Server 2019's Polybase extension supports Hadoop Clusters, MongoDB and Terradata T-SQL query integration. A new feature called SQL Server Big Data Clusters helps make distributed NoSQL nodes manageable within SSMS environment.

Mongo, Hadoop and other NoSQL database servers have SQL server integration to support relational data sources.

SQL Server 2019 Polybase integrates Hadoop, MongoDB and many other sources with relational data and T-SQL queries


CAP Theorem: a distributed data system like most all NoSQL solutions can only achieve 2 of the 3 features: "Consistency", "Availability" and "Partition Tolerance"

ACID vs BASE:  The relational axiom of "Atomic, Consistent, Isolated, Durable" contrasted against NoSQL's vague promise of "Basically Availability, Soft State, Eventual Consistency" (dirty reads common)

This ol' tried-and-true database server software ain't going away in the foreseeable future


Hundreds of millions of corporate, mid and small business applications are running along just fine in 2019 using various RDBMS platforms (SQL Server, Oracle, DB2, PostgreSQL, MySQL, etc.) for at least one of their data stores.

Many more millions of applications have been using one riff or another of NoSQL (semi-structured data) before, during and after the mythical "Relational Movement" as described by software veteran Robin Bloor:

"The Relational Model of Data Never Dominated Anyway. Estimates vary, but it is generally agreed that somewhere between 70% and 95% of the world’s data is stored only in poorly structured or unstructured formats such as: word processing documents, spreadsheets, HTML files and e-mail. The truth is that Relational database never did really dominate. It was rejected out of hand, year after year, as an effective store for many types of data." -Robin Bloor on insideanalysis.com


Google search trends over the last 5yrs certainly suggest relational SQL is not going anywhere anytime soon...


Considerations when evaluating whether to use NoSQL:
  • NoSQL is a precise tool for precise data needs; if relational SQL is too much for your group, NoSQL will likely be too steep a learning curve
  • Data Integrity- when billions of NoSQL records are affected by a small change in schema that is not able to propagate correctly or runs into constraint issues or hierachy and relations are impossible to infer... maybe relational SQL would be a better approach
  • NoSQL touts loose schema structure is a benefit but this simply means schema and data structure enforcement has been shifted from the database layer to the application layer. Data cannot "self-manage".
  • Some apps are prime candidates for NoSQL's document-centric and resource-centric distributed storage architecture


Also, there is this to consider:


(re: the longevity and simple-yet-powerful abstractions of SQL)


If NoSQL solutions are eventually able to achieve the same transactional consistency and complex schema structures that some applications require and then ultimately subsume RDBMS completely- it'll still require a lot of SQL gurus to convert and integrate all the legacy relational database apps for a long, long time to come...

Bring on MongoDB, CouchDB, Dynamo, MapReduce, HBase, BigTable, Cassandra.


As data professionals we will have an increasingly complex array of tools to understand; what we do with them will drive the future



Long live SQL Server 2030. 😉




References:

https://blog.timescale.com/why-sql-beating-nosql-what-this-means-for-future-of-data-time-series-database-348b777b847a/

https://www.wired.com/2012/12/couchdb/

https://pdfs.semanticscholar.org/a6f0/1c9103d3bafb8ce92641c9f2a4deaccd12f9.pdf

https://www.memsql.com/blog/why-nosql-databases-wrong-tool-for-modern-application/

https://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:1589423200346982646

https://news.ycombinator.com/item?id=479165

https://insideanalysis.com/is-the-relational-database-doomed/

ETL and EDI Using SSIS

ETL is the process by which you can take (Extract) data from various (usually related) data sources, Transform that data to meet your destination system's needs, and finally Load that transformed data into the destination system data store.

Your table structure will be something along the lines of this basic template:

In a real-world db environment Staging, OLAP, OLTP and other data repos may be on different database servers, this is same db server for demonstration

We will use SQL Server Integration Services (SSIS) and develop the SSIS package within the Visual Studio 2017 IDE.


The first step of the SSIS package load (INSERT) the data into a STAGING area database. This allows us to:
  • Store off the intermediate data from all sources into analysis-friendly OLAP datastores
  • Perform data integrity checks
  • Keep extraction and transformation as two strictly separated steps

We load the data from the various source files (.csv, .xls, .xlsx) into SQL Server database Staging table(s) using SSIS Source and Destination Data Flow Tasks connected with the movable data flow arrows. Once you have connected a source and destination you can go into the Destination Data Flow Task and edit the mappings of which source columns should be written to which destination columns.

Next we perform some transformations. This can be anything from a simple ranking or status/flagging/business prioritization algorithm to data cleansing to data partitioning based on certain criteria; the key is that this Transform step is where we apply T-SQL UPDATEs to transform the data once it has all been aggregated in Staging.

Then we refresh the OLAP destination tables using the same kind of Source and Destination Data Flow Tasks and mappings as used for Staging. The OLAP data is used for data analysis.

Finally, we load the cleansed Staging data into our destination system's OLTP database and email or text message the system owner upon successful completion of the SSIS ETL job (or deliver an error if anything fails). The OLTP data stores live transactions.

Bear in mind that most ETL data-flow step mappings are not a 1:1 match; this is just an e2e demo of SSIS ETL in most basic form


Happy ETL'ing, and be sure to watch out for cases of mysterious symbols/characters from miscellaneous data copied from other programs or from other system environments that were using a Language setting (codepage) which is incompatible with your ETL software. Bad data happens more than you think and as we say, GIGO.

Your end result looks like this (all Green Checkmarks indicates all was successful; I recommend using PaperCut for SMTP testing- super cool and useful product

I would attach or GitHub the source code (and will do so upon request) but SSIS project code has a lot of dependencies and can get quite messy for another to re-use project on even just a 'slightly different' machine.

Having used SSIS' now-deprecated predecessor "DTS" (SQL Server Data Transformation Services) and SSIS for many years I can attest to the fact that the best way to learn this product is by diving right in on your own and begin the creation of sources and destinations and source/destination connection managers and control flow events, and .NET integration, and exception event handlers, etc.

You will likely run into some ambiguous and not well-documented errors when developing in SSIS; but persist in your efforts and you can create a very powerful EDI system with the many capabilities of SSIS and the robust Scheduled ETL jobs that it can create.

SQL CLR for .NET in SQL Server

You may find yourself with the need to integrate a .NET method within SQL Server to be called as a function. This usually happens when some relatively complex looping and modifying logic is a requirement of a SQL operation.

SQL is a great data language but it is not the right language for some tasks. Creating a SQL CLR from a .NET assembly may be the best approach to some unique situations (and there is a bonus in that, in many cases you can reuse existing .NET code).

Before creating the CLR object we need a .NET .dll; so first we create a basic .NET assembly compile in Release and copy the path the the compiled .dll:

This is our simple .NET CLR method with which we want to run within the SQL Server query execution engine


SQL CLR provides a way for you to integrate complex .NET methods within SQL Server


Import into SQL Server instance via SSMS*: 


Select New Assembly... 




...and then enter the path to your Release .dll


Create T-SQL function or stored procedure to serve as caller for the function and run it:

From here we can see all of the T-SQL code involved; the 3 SQL Server configuration conditions (shown in the 3 EXEC statements) are required

And that is all there is to it. Only use CLR functions when absolutely necessary as RDBMS's like SQL Server are designed to processes relational data in sets, and not to apply complex business logic on individual rows.

But if there is no other way- SQL CLRs could provide you a solution to your code/logic integration problems.


*Warning and Reference: https://blog.netspi.com/attacking-sql-server-clr-assemblies/

Neo4j Graphs: Creating and Querying Edges and Nodes

First it is helpful to understand the basic premise behind Graph Theory, which is the foundation of these "Pairwise relations between objects" that we can store and explore in graph databases. The mathematical representation is the formula:

A graph (G) is equal to the connections of its entity nodes aka "vertices" (V) with its relationship edges (E)


Swiss mathematician Leonhard Eueler was looking for a solution to a puzzle on the relationships between land masses and bridges in the Prussian city of Konigsberg. This was the genesis of Graph Theory:

"The geometry of position, now known as Graph theory"

Graph database technology is useful for analyzing complex relationships and relationship behaviors in myriad scenarios not limited to:
  • Computer Networking
  • Spread of Gossip
  • Fraud Detection
  • Forensic Investigations
  • Flight Mapping/GPS
  • Population Growth
  • Spread of Infectious Disease
  • Hierarchy Visualization

It is also vital component for the analysis of (increasingly) unstructured data. By unstructured we typically mean data that is not easily modeled in a traditional relational hierarchy but may still have common properties and so still have relationship value. It is estimated that around 80-90% of an organization's data is unstructured.

Example of a couple simple graph relationships

Relationships in graph data are formed by the edge tables which signify a relationship between two different entities or "nodes" which are stored in node tables. Instead of RDBMS FK/PK and other check constraints, the relationships are defined in the edge tables; all of the properties that one would want to query via CQL to find things are stored in JSON metadata inside the node tables instead of the columns-for-each-property that defines relational data architecture.

Node Tables: Represent a data entity

Edge Tables: Store relationships between nodes

For the RDBMS purists and skeptics out there, I recommend this quote about Node, Graph and other alternative data processing paradigms vs. the traditional RDBMS OLTP and OLAP models:
"The NOSQL acronym is: Not Only S Q L. NOSQL solutions are not a replacement or successor for RDMBS systems, nor were they ever intended to be. They are useful tools to be used for specific purposes. They are to be used as part of an organisation’s data management solution and not as a total replacement for the existing solution". -Simon Munro
It is important to note that graph data can be queried in much the same way as SQL through a graph-specific query language called CQL (cipher query language) as you will see in the following demonstration.

For a quick demonstration on how easy it is to get up and running with this technology we will be using Neo4j which you can download here: https://neo4j.com/download/


Walkthrough of Neo4j:
Fire up the Neo4j Desktop client, click "New" and then click the "Add Graph" button to create your first graph database. Once the db has been created, click the play icon to start it up (it must be running for Neo4j browser to connect).

This is the Neo4j main home screen from which you can create and connect to graph databases and projects

Then click the "Neo4j Browser" button to launch the graph data browser for creation and visual exploration of graph data relationships.

Next, in the Neo4j browser, click "Jump into Code" and simply follow the prompts to begin creation and querying of graph data.

Once, in the Neo4j tutorial, you can simply follow the prompted instruction widgets.

I would continue with instructions but the rest is refreshingly self-explanatory. You first create a movie database that contains actors, directors and movies stored in Node (entity) tables and the types of relationships between these Nodes stored in Edge (relationship type) tables.

One thing to note is that to execute a CQL command in the browser, you must hold down CTRL+Enter. Alternatively you can click the play button to execute CQL.

As you continue with the tutorial you will wind up with something like this when you get to the step that has you execute CQL to find all Node entities that are within 4 degrees (or node "hops") of the actor Kevin Bacon:

4 degrees of separation from Kevin Bacon...

Once you get comfortable with CQL syntax it is relatively easy to start modeling and creating your own graph database structures which can help you and/or your company to analyze some of the unstructured and semi-structured data that is hard to extract value from with traditional RDBMS.

Bigtime kudos to the Neo4j team on making this so straightforward and simple to learn and get up
and running with a new technology so fast. I've never seen a technology tutorial like it.

As you can see, there is tremendous potential value in exploring data relationships that don't necessarily fit neatly into traditional RDBMS/hierarchical databases but are no less useful a tool to have in an organization's data analysis arsenal.


References:

https://www.mssqltips.com/sqlservertip/5007/sql-server-2017-graph-database-query-examples/

https://www.youtube.com/watch?v=gXgEDyodOJU

https://www.red-gate.com/simple-talk/sql/t-sql-programming/experiments-with-neo4j-using-a-graph-database-as-a-sql-server-metadata-hub/

https://www.youtube.com/watch?v=mVWn8k49mAQ

Accessing SQL Server Data in R

Importing SQL Server data into R for analysis is pretty straightforward and simple. You will obviously need R installed on your machine. The following R code will connect to your SQL Server database (using R Studio):

 library(RODBC)  
 dbconnection <- odbcDriverConnect('driver={SQL Server};server=.;database=CLARO;trusted_connection=true')  
 initdata <- sqlQuery(dbconnection,paste('SELECT * FROM [CLARO].[dbo].[Fielding];')) 


SELECT data from a SQL Server database output in R Studio


Accessing SQL Server Data in Python

So you want to access Microsoft SQL Server from your Python script(s)?

After weeding out some long-abandoned and/or nonworking solutions, I discovered a very simple Python ODBC driver that works with virtually all SQL Servers since MSSQL 2005 called "pyodbc". 

First, you will need to install this MSSQL ODBC (13.1 or 17 should work) component on your machine in addition to installing the pyodbc driver.

Next, get the pyodbc module for Python by running this from Windows command prompt:

pip install pyodbc

Then open up a python shell using 'py' or 'python' and enter the following after editing configuration values to match your development environment:

 import pyodbc  
 cnxn = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};SERVER=localhost;DATABASE=WideWorldImporters;UID=DemoUser;PWD=123Password')  
 cursor = cnxn.cursor()  
 #Sample of a simple SELECT  
 cursor.execute("SELECT TOP (100) Comments, count(*) FROM WideWorldImporters.Sales.Orders GROUP BY Comments")  
 row = cursor.fetchone()   
 while row:   
   print(row[0] + ': ' + str(row[1]))  
   row = cursor.fetchone()  

Running this code will result in the below if you have configured everything correctly (note this example makes use of the Microsoft SQL Server demo WorldWideImporters database):


Reference: https://docs.microsoft.com/en-us/sql/connect/python/pyodbc/step-3-proof-of-concept-connecting-to-sql-using-pyodbc?view=sql-server-2017

OLAP: Facts and Dimensions

OLAP can adequately be described as the storage of redundant copies of transactional records from OLTP and other database sources. This redundancy facilitates quick lookups for complex data analysis because data can be found via more and quicker (SQL execution) paths than normalized OLTP. And OLTP after all- should stick to its namesake and what it does best: processing, auditing, and backing up online transactions, no? Leave data analysis to separate OLAP Warehouses.

OLAP data is typically stored in large blocks of redundant data (the data is organized in ways to optimize and accelerate your data mining computations). Measures are derived from the records in the fact table and dimensions are derived from the dimension tables.


Facts are "measurements, metrics or facts of a particular process" -Wikipedia. Facts are the measurement of the record value: "$34.42 for today's closing stock price", "5,976 oranges sold at the market", "package delivered at 4.57pm", etc.

Dimensions are "lists of related names–known as Members–all of which belong to a similar category in the user’s perception of a data". For example, months and quarters; cities, regions, countries, product line, domain business process metrics, etc.

Dimensions give you a set of things that you can measure and visualize in order to get a better pulse and better overall understanding of the current, past and even potential future (via regression models) shape of your data- which can often alert you to things you might not be able to see (certainly not as quickly or easily) in an OLTP-based Data Analysis model which is often tied to production tables out of necessity or "we don't have time or money to implement a Data Warehouse".


Yes, you definitely do need OLAP/DW capabilities if you think long-term about it. Having intimate knowledge of your operations and how you can change variables to better run your business? Why would any business person (excepting those committing fraud) not want that?

I'd say that implementing a true and effective OLAP environment is worth any project investment and would pay itself over and again in the way of better and more specific/actionable metrics that help administrators of operations make the best, data-backed decisions- some very critical decisions involving millions of dollars and sometimes lives. I'd like a better look at the data before making a multi-million dollar or life/death decision.


SAS, Hadoop, SSAS, with healthy doses of R and/or Python customization?- whatever data solution you choose, my advice is to go with something that has a great track record of providing the kind of solutions that your business and/or your industry require. Use the tool that your ETL developers can utilize effectively and that helps you to best meet the needs of your constituents with whom you will exchange data with (the company, the customer, industry and/or regional regulation compliance).

ETL and the Case Against a Single System for All Business Processes

Nearly every business wants to centralize their business data so that they can get an at-a-glance overview of what is going on at any given point in the day. Suppose, for instance, a company has its business data on 5 separate database servers, in various databases therein, across several different tables. And much of the data will need to be slightly modified (transformed) in order to make data types consistent and optimize it for quick and meaningful data analysis.

Assume the database servers are:
  • CMS, Orders (Dynamics 365 data)
  • Accounting (Sage 50 data)
  • Warehouse, Logistics (SAP data)
  • Field Service, Product Support (SalesForce data)
  • Call Center (TantaComm data)
Furthermore, the business wants to apply some custom logic to various fields so that certain calculations don't have to be made by the OLAP end users and applications.

ETL is just moving data from a Source to a Destination, often making some changes (transformations) along the way

Well, it all seems so complicated, right? How can all that disparate data be combined and standardized? It's not as complicated as allowing an ERP solution define how your various departments do their job. Some ERP products truly do have the functionality to satisfy 80-85% of the company business requirements. And when that is the case, by all means, companies should go that route.

But for instances where ERP of every business function is not working, you can use products like SSIS, Informatica, and Oracle Data Integrator to easily map ETL data visually and apply code operations to the data using a variety of programming languages.

But of course, there is no "one-size-fits-all 100% of our business needs" ERP solution; different business productivity software work better or worse for different industries, different departmental needs, for all manner of different reasons. Why care about the choice of business tool that your business' management and best minds know and have proven is most effective for their- often very specific- job?

Allowing a business to integrate the "best apps for each particular type of job" is nothing more than a sort of microservices architecture. But if we are working with custom code in these various LOB applications (at least some of the LOB apps will need custom API integrations so that they can talk to each other), with each new block of custom code, comes more decentralized code bases and decentralized test scripts...

All considered, I would argue that trying to put square pegs into round holes a la "ERP-everything" is a much, much, MUCH bigger effort than a well-developed and tested scheduled ETL Data Warehouse refresh process to centralize your company's data.

It's always a balance with software. Use the most effective apps for your organization and departmental requirements, but try to avoid the dependency hell that can come along with a poorly managed distributed application environment.

Here are the ETL steps in order, from the resource link below:

Extract the data from the external data sources such as line-of-business systems, CRM systems, relational databases, web services, and anywhere you have business data you want to summarize.

Transform the data. This includes cleansing the data and converting it to an OLAP-friendly data model. The OLAP-friendly data model traditionally consists of dimension and fact tables in a star or snowflake schema.

Load the data so that it can be quickly accessed by querying tools such as reports, dashboards, etc.

"90% of the requirements now is better than 100% of the requirements never" -Unknown

Reference: https://blogs.msdn.microsoft.com/andreasderuiter/2012/12/05/designing-an-etl-process-with-ssis-two-approaches-to-extracting-and-transforming-data/

Log Shipping and Log (and/or Differential Backup) Restoration

Most developers know what a full database backup is. Did you know a differential backup is just a subset of the full backup? But what is the point of transaction log shipping in SQL Server?

A "transaction log" is just a subset of a subset of a full backup containing a set of all T-SQL transactions for a given (usually short span) amount of time; they are typically set at intervals of every few minutes but this varies according to application and data compliance needs.

A commenter in the referenced web link below aptly notes:

"the point of log shipping is to keep the secondary server closer to 100% synced than the full backups (which are much larger and produced much less frequently than log backups)."

Log shipping can be used to serve as a constant "warm backup" (similar to SQL Server database mirroring) of all production transactions for a standby/failover SQL Server* as well as a dedicated Backup SQL Server as depicted here:

 Setup if you have a fully-functioning Secondary Production SQL Server

Or, more commonly, the shipped backup logs are sent to a single Backup Server:

Typical production database server scenario, though the steps are a bit out of sequential order, if not (or due to) visual order

In short, a full backup contains the entire database you to recover- up to a certain point in time.

A differential backup is a subset of a backup (generally "full" backups contain days worth of data for bigger organizations)- it would only be prudent to include one or even four differential backups to ensure quick disaster recovery (and full recovery of vital customer data).

Backup logs are chunks of SQL transactions that eventually compose the differential backup(s) and finally the entire full .bak file.

These 3 disaster recovery tools that SQL Server affords DBAs and developers allow you to restore your production data back to the point in time when your server failed:

  • Apply the most recent full backup of the prod server that went down*
  • Apply any differential backups since the most recent full backup
  • Apply all transaction logs up to the point in time of the failure**
  • Apply all StandBy server transactions logs that may have occurred (if not ReadIOnly)

With all of this said, do be aware that different scenarios and requirements call for different kind of disaster recovery solutions. Consult with the experts and never neglect to TEST YOUR RESTORE PROCEDURE WITH ACTUAL .bak and .trn files- and then test the applications and services connected to those database servers against their restored/Secondary version to ensure you suffer as little downtime as possible in the event of a database emergency.

What a restoration timeline might look like

*Note(!) to the questions on Microsoft Licensing and your StandBy server, should it need to be used for any production activity during your disaster recovery:

"I am not the licensing police, and I am not Microsoft – check with your licensing representative to clarify your exact situation. Generally, you can have one warm standby server. However, the second someone starts using it for reporting, testing, or anything else, you need to license it like any other server."-Brent Ozar

**"If the database is in Standby mode, users can query it – except when a log backup is being restored. You need to decide if a restore job will disconnect users, or if the restore is delayed until after the users are disconnected." -Brent Ozar

References:

https://www.sqlshack.com/sql-server-disaster-recovery/ (SQL Server Replication is another compliment or alternative to the log shipping and backup/restore)

 https://www.brentozar.com/archive/2013/03/log-shipping-faq/

Temporal Tables in SQL Server

Definition from Microsoft: SQL Temporal Tables, also known as "system-versioned tables" are a database feature that brings built-in support for providing information about data stored in the table at any point in time rather than only the data that is correct at the current moment in time. Temporal is a database feature that was introduced in ANSI SQL 2011.

What is it used for?
1). Auditing Data History- identifying who did what, when, and row-level restoration to the state of the temporal table data any point in time. (Time travel)

2). Monitor Data Deltas (ie. maintain company sales discounting policy by ensuring no product has had a price change that exceeds some company threshold (200% increase or decrease, etc.))

3). Slowly Changing Dimension management (eliminate the need for complicated ETL logic for tracking SCD- SQL Temporal Tables do this audit tracking automatically)




In Practice
Once upon a time, I worked on a project that had a "unique" (kind euphemism) legacy database schema design. This company had a custom "system" for tracking changes in data. There was no referential integrity so everything relied on application code not having any data access code bugs, which as we all know is not a particularly reliable setup.

Temporal Tables would have done the trick without all the complexity of having to manage a proprietary data versioning system that sometimes worked, oftentimes caused headaches and mass confusion over the root cause of production issues related to versioning of certain data entities.

Temporal Tables have the current or base table, as well as a history table that logs changes over time

Unless there is a specific reason that SQL Temporal Tables won't work for your scenario* you should take advantage of existing functionality like this so that you can focus on the unique functionality of your app and not waste precious time reinventing wheels.

Creating a temporal table is relatively straightforward. In SSMS (connected to an instance of SQL Server 2016 or later), select the option as seen below.

Menu option in SSMS


Here is the T-SQL for basic Temporal Table DDL:

 USE [KpItSimpl]  
 GO  
 --Create system-versioned temporal table. It must have primary key and two datetime2 columns that are part of SYSTEM_TIME period definition  
 CREATE TABLE dbo.SimplTemporalTable  
 (  
      --Table PK and columns definition  
   ProductID int NOT NULL PRIMARY KEY CLUSTERED,  
   ProductDescription varchar(max) NULL,  
      ProductPrice decimal,  
   --Period columns and PERIOD FOR SYSTEM_TIME definition  
   ValidFrom datetime2 (2) GENERATED ALWAYS AS ROW START,   
   ValidTo datetime2 (2) GENERATED ALWAYS AS ROW END,  
   PERIOD FOR SYSTEM_TIME (ValidFrom, ValidTo)   
 )  
  WITH (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.SimplTemporalTableHistory));   
 GO  
T-SQL DDL for creating a new Temporal Table


For a practical example, I inserted some data into this temporal table, and then committed some UPDATE statements on that data, changing the ProductPrice value of some of the product records. The query below demonstrates how we can look for records (we are likely most interested in the exact change time) that had a price change exceeding +/- 200%:

Example of a query utilizing the Temporal Table feature to see an excessive product price change

Note(!) one limitation that stands out right away- you cannot drop a temporal table like a standard SQL table. You must turn versioning off on the table, drop the table and drop the history table. So instead of selecting "Delete" from the Table menu options or executing "DROP TABLE", you will want to select "SCRIPT AS >> DROP To" option as seen below:

You must script the deletion/drop of a SQL temporal table

As you can see, SQL Temporal Tables can make managing data changes a lot easier. Certainly much easier than a proprietary change versioning system and much faster than restoring the entire database to inspect the state of data at a particular point in time.


*Temporal Table Considerations and Limitations: https://docs.microsoft.com/en-us/sql/relational-databases/tables/temporal-table-considerations-and-limitations?view=sql-server-2017

Reference: https://docs.microsoft.com/en-us/sql/relational-databases/tables/temporal-table-usage-scenarios?view=sql-server-2017

T-SQL STUFF, STRING_SPLIT

In cases where you must extract a series of values from a delimited string, you can utilize SQL Server's built-in string_split() function. You need only specify the delimited string and the delimiting character (commas in this case ",") as the arguments.

When a requirement calls for you to roll up data into a single row, you can use STUFF() (with or without combined with FOR XML, depending on your data). Just specify an inner query as the argument for STUFF().

 --Get ids from a csv of ids  
 SELECT * FROM string_split('100,101,201,301,411,414', ',')  
 
--Push a series of values into one row  
 SELECT playerID,  
 STUFF((select '; ' + teamID  
      from Managers  
      where teamID like('%P%')  
            group by teamID  
      for xml path('')), 1, 1, '') allTeams  
 FROM Managers  

Result of these functions

T-SQL and .NET for XSLT

Steps to perform transformation from SQL to XML to XSLT Output data (csv, htm, tab-delim, fixed-width, etc.):

1). Get your SQL data (assuming you are using SQL Server) into XML format via:

 SELECT [OrderId], [PersonId], [OrderDateTime] FROM [Order] FOR XML AUTO, ELEMENTS, ROOT ('Orders');  


XML Result of "FOR XML AUTO, ELEMENTS, ROOT"..

2). Load XSL transform file (OrderXform.xsl) into a new XslCompiledTransform object (transform):

       XslCompiledTransform transform = new XslCompiledTransform();  
       transform.Load("C:\\Xform\\OrderXform.xsl");  


XSL file used to transform the XML to the desired file output (EDI 834 standard, etc)


3). Using a StringWriter (writer), StringReader (inputReader) and XmlReader (inputXmlReader), read the inputReader XML string into inputXmlReader and apply the XSL transformation to the transform object via: transform.Transform(inputXmlReader, args, writer); //(or another override of Transform())

  using (StringWriter writer = new StringWriter())   
     {   
      // Make an XML reader (inputReader) out of the string (input)  
      XmlReader inputXmlReader;   
      using (var inputReader = new StringReader(input))   
      {   
       inputXmlReader = XmlReader.Create(inputReader);   
       // Apply the transformation to XmlReader (inputXmlReader) and write it to StringWriter (writer)   
       transform.Transform(inputXmlReader, null, writer);   
      }      
      // Retrieve the output as string from writer object  
      return writer.GetStringBuilder().ToString();   
     }   

4). Return the result and write the bits (to memory as Label Text for a demo like this, to file, to domain network location, to non-domain customer ftp, etc.).

The result normally gets written to remote SFTP location or posted to API endpoint for processing

Nothing incredibly complex about this, but it seems to frustrate a lot of people in .NET land so I have documented a quick and easy example.

GitHub Source: https://github.com/Radagast27/Simpl_XSLT_dot_net

Tool for XSLT validation: https://www.freeformatter.com/xsl-transformer.html

References:

https://www.red-gate.com/simple-talk/sql/learn-sql-server/using-the-for-xml-clause-to-return-query-results-as-xml/

https://www.w3schools.com/xml/xsl_examples.asp

https://www.c-sharpcorner.com/article/transform-xml-output-of-sql-query-using-for-xml-auto-statement-to-html-using-xsl/