Showing posts with label SQL Server. Show all posts
Showing posts with label SQL Server. Show all posts

extRS for useful common logic, reference data and extending SSRS

extRS is a .NET assembly, available as a Nuget package here, which is a utility for getting data out of SSRS, FedEx, UPS, USPS, financial markets, among several other useful functionalities.       
  
            

extRS was initially designed as a set of helper functions to access things in RS, but it evolved to contain a lot more..



Here are a couple examples of some of the useful things you can do with extRS:

Create, Get and Delete an SSRS Report Subscription
     [TestMethod]  
     public async Task CreateGetDeleteSubscriptionSucceeds()  
     {  
       string json = @"{  
       ""@odata.context"": ""https://localhost/reports/api/v2.0/$metadata#Subscriptions/$entity"",  
         ""Owner"": """ + Resources.User + @""",  
         ""IsDataDriven"": false,  
         ""Description"": ""string..."",  
         ""Report"": ""/Reports/USPolls"",  
         ""IsActive"": true,  
         ""EventType"": ""TimedSubscription"",  
         ""ScheduleDescription"": ""string..."",  
         ""LastRunTime"": ""2023-04-13T15:51:04Z"",  
         ""LastStatus"": ""string..."",  
         ""DeliveryExtension"": ""Report Server Email"",  
         ""LocalizedDeliveryExtensionName"": ""Email"",  
         ""ModifiedBy"": """ + Resources.User + @""",  
         ""ModifiedDate"": ""2023-04-13T15:51:04Z"",  
         ""Schedule"": {  
           ""ScheduleID"": null,  
           ""Definition"": {  
             ""StartDateTime"": ""2021-01-01T02:00:00-07:00"",  
             ""EndDate"": ""0001-01-01T00:00:00Z"",  
             ""EndDateSpecified"": false,  
             ""Recurrence"": {  
               ""MinuteRecurrence"": null,  
               ""DailyRecurrence"": null,  
               ""WeeklyRecurrence"": null,  
               ""MonthlyRecurrence"": null,  
               ""MonthlyDOWRecurrence"": null  
             }  
           }  
         },  
         ""DataQuery"": null,  
         ""ExtensionSettings"": {  
           ""Extension"": ""DeliveryExtension"",  
           ""ParameterValues"": [  
             {  
               ""Name"": ""TO"",  
               ""Value"": ""colin@sonrai.io"",  
               ""IsValueFieldReference"": false  
             },  
             {  
               ""Name"": ""IncludeReport"",  
               ""Value"": ""true"",  
               ""IsValueFieldReference"": false  
             },  
             {  
               ""Name"": ""Subject"",  
               ""Value"": ""true"",  
               ""IsValueFieldReference"": false  
             },  
             {  
               ""Name"": ""RenderFormat"",  
               ""Value"": ""PDF"",  
               ""IsValueFieldReference"": false  
             }  
           ]  
         },  
         ""ParameterValues"": []  
       }";  
       Subscription subscription = await ssrs.SaveSubscription(JsonConvert.DeserializeObject<Subscription>(json)!);  
       Assert.IsTrue(subscription.DeliveryExtension != null);  
       var getResponse = await ssrs.GetSubscription(subscription.Id.ToString()!);  
       Assert.IsTrue(getResponse.Id != null);  
       var delResp = await ssrs.DeleteSubscription(subscription.Id.ToString()!);  
       Assert.IsTrue(delResp);  
     }  


Getting FedEx/USPS/UPS Shipping Rates






"A meaningful and useful abstraction that is able to withstand the test of time is very difficult to design. The main difficulty is getting the right set of members—no more and no fewer. If an abstraction has too many members, it becomes difficult or even impossible to implement. If it has too few members for the promised functionality, it becomes useless in many interesting scenarios." -https://www.informit.com/articles/article.aspx?p=3089304


An SSRS IFrame/CORS infinite redirect loop error and a quick and easy solution

The redirect loop looks like this and, in Edge, will display the error message: "[domain] redirected you too many times"

If you are trying to render the SSRS ReportViewer control within an <iframe>, you may run into a CORS issue that manifests in a series of 302 (Found) responses and an infinite redirect loop between ReportViewer control (ReportViewer.aspx) and Logon.aspx.

As of SSRS 2022, without an explicit instruction to allow CORS, ReportViewer cannot be rendered within an <iframe> on an origin different than the origin of the report server.

If you are using custom authentication, the solution is easy enough. Just add cookieSameSite="None" and enableCrossAppRedirects="true" to the authentication <forms> tag in the report server's web.config.

 <authentication mode="Forms">  
   <forms loginUrl="logon.aspx" name="sqlAuthCookie" cookieSameSite="None" timeout="60" path="/" enableCrossAppRedirects="true" requireSSL="true">  
   </forms>   
 </authentication>  

You may also need to enable CORS in your client app. In ASP.NET Core 8, this can be achieved through the following in your application startup code:
 app.UseCors(builder => builder  
 .WithOrigins("https://localhost", "https://[domain]")  
 .AllowAnyMethod()  
 .AllowAnyHeader());  

Power BI and SSRS - A complimentary symbiosis

 

"Generally, Power BI paginated reports (SSRS reports) are optimized for printing, or PDF generation. Power BI reports are optimized for exploration and interactivity."

 

So, what really IS up with MSBI these days? Is SSRS getting shuttered? PBI has paginated RS-like reports, but not a lot of the other features SSRS provides. Microsoft marketing will continue to hate people like me who would go to great lengths to help keep alive an aging reporting technology. But the thing is- SSRS simply does the job for the vast majority of reporting use cases. In the last 18-20 years, there have been no major advancements in scheduling and snapshotting and otherwise caching and managing and distributing electronic information. And SSRS has all of that built in.


Similar BI products- but they'll both be around for a while, with PBI eventually subsuming all the most useful SSRS tech


Nearly all of the advancements in reporting technology have come on the presentation and client-side. We can now create beautiful ad-hoc analysis and brilliantly composed interactive charts and other data presentations. But this all comes with a not insignificant price ($10 per user/month). And beyond the price, much like Azure SQL (vs. a genuine "Microsoft SQL Server" VM) and the extremely limited Azure SQL Workbench (vs. SSMS), there is a lot that Power BI cannot do well or at all.

You may have noticed the built-in SSRS reports in SSMS19's new QueryStore feature. These are very useful reports that can give DBAs an idea of how queries and being processed and what processes are consuming the most CPU. And it is a good example of a company "eating its own dog food".

I've seen SSRS installations that contained thousands of reports representing trillions of dollars of value, categorized and summarized with realtime security ownership, counterparty, price and other core trade data. Several of these business-ctitical reports had scheduled delivery and were cached and snapshotted programmatically.

Little old SSRS is a quiet but reliable business data spartan. To my surprise it is actually quite popular in the investment banking industries where stock valuations and company summaries on reports are a big part of the lifeblood that drives investment banking decision making.


A professionally developed PBI report looks and behaves more like an interactive BI dashboard- this is a good example of a good PBI report


And with a little bit of customization magic via things like ExtRSAuth, ExtRSNET48, ExtRS and other RS extension tools, SSRS and Power BI can be tailored-made to serve as a uniquely effective symbioses of print formatted, scheduled and data-driven management reports (SSRS) and ad-hoc or OLAP-based interactive data analysis charts with data visualizations (Power BI).

To answer the question of "when will SSRS be end of life?", I would say that SSRS isn't going away anytime soon. Microsoft has decided to combine SSRS and PBI (RS, .rdl reports are the "Paginated Reports" in PBI) in a way that serves both platforms. The PBI3.0 REST API indicates as much as the combined SSRS/PBI API offers a plethora of functionality that .NET developers can use to get the best of both worlds (SSRS and Power BI) and customize RS and/or PBI dashboards to support unique business processes. 


A print-formatted SSRS report- great for standardized, templated data reporting


The choice in what tool or tools you will enrich your printed reports and data visualizations with, is yours. Keep in mind that many organizations make use of both- with SSRS getting equal to more attention than PBI even to this day- not only because of the huge, global SSRS install base with all of these currently running SSRS reports- including many which support critical business and governmental processes across the globe. But also because Power BI requires a monthly subscription fee :( . Freeware seems to be slowly dying. Let's hope things change with the next version of SQL Server and maybe we'll get free* PBI. 

SQL Server 2025? Happy reporting and data analyzing. Always remember that PBI and SSRS serve different organizational needs- ad-hoc analyzation of data and pixel-perfect, professional, print-ready reports, respectively.


*(at least a free "tier"? I mean c'mon MSFT..... developers want to CREATE, and MSBI data visualization creativity is dying behind that paywall)- SSRS and PBI should be free and work hand in glove. Anything less is a mistake and a gigantic missed opportunity, imho.


Reference: https://learn.microsoft.com/en-us/power-bi/guidance/migrate-ssrs-reports-to-power-bi

ExtRSAuth for Custom SSRS Authentication (works w/newest SSRS version 16.0.8)



Fortunately, ExtRSAuth code doesn't require any updates to work with MSSQL SSRS 2022


ExtRSAuth for custom SSRS security 

This assembly, forked from the Microsoft Custom Security Sample extends and improves custom authentication to allow for mechanisms other than user/pwd credential check and to offer a seamless pass-thru of the Login page if something present in the HttpRequest verifies that user is already authenticated. For instance, the user already has an app token from an app that communicates with the report server, and you require the communications with the report server to not involve any intermediate screen or login UI. The user just wants to auth as fast as possible and get to their report, right?


What does ExtRSAuth do to authenticate SSRS user connections?

For direct URL report server access, the default here is to allow local connections, which grants Admin rights for any local requests. If the SSRS request is external, a fallback option accepts an AES 128-bit encrypted querystring from the calling app, and the application, if decryption works, is authenticated and allowed to communicate using a read-only SSRS user; any exception thrown indicates the request is neither from a local connection nor a secure request from the external app.


ExtRSAuth gives SSRS environments the freedom from MS Active Directory that they deserve



To secure the built-in SSRS REST API v2.0 access, you can simply customize the LogonUser() and VerifyPassword() methods in AutneticationExtension.cs and AuthenticationUtilities.cs, respectively:


If you don't perform a security check here, the SSRS REST API will be open to anyone who knows the SSRS admin username


As you can see, this is but one of many approaches we can take with ExtRSAuth in an SSRS-connected application or business environment. Any type and granularity of custom authentication and level of authorization is possible. The only ingredient needed is a .NET developer or developers willing to customize a pretty basic .NET security model.

Real-world applications

This Custom Auth assembly has been tested with (1) several .NET Framework 4.8 and .NET 5, 6, 7 and 8 web and mobile applications, (2) with the SSRS API and all its operations, (3) with the SSRS /ReportServer and the /Reports management web interface as well as (4) Visual Studio 2022 Reporting Services projects (report designers can seamlessly deploy Report Server projects from VS to the Report Server with ExtRSAuth).


After running InitalizeExtRSAuth.ps1, a successful installation will output the above


Demonstration

This YouTube explainer video describes the SSRS external user authentication problem that ExtRSAuth addresses.

Requirements
This plug-in relies on SSRS (2016 or later), and a report server configuration as described in Microsoft's Reporting Services Custom Security Sample

-Replace [your_sym_encr_key] with your symmetric encryption key. Clients can encrypt SSRS URL access querystring with Sonrai.ExtRSAuth.Excryption.Encrypt() or a similar 128-bit AES encryption implementation, or modify Encrypt() with any encryption algorithm and key and block sizes.

Clone it, customize it further (or not) and get started today: https://github.com/sonrai-LLC/ExtRSAuth

Custom SSRS Authentication Extended

One of the limitations of SSRS is that it cannot be used outside of a Windows environment (security is all dependent on Active Directory and Windows User accounts). Unfortunately app impersonation does not work cleanly enough as it will prompt users for credentials when they 1st authenticate to the Report Server. That doesn't cut the mustard for user expectations of public-facing apps.



Authentication only works as far its interoperability can reach (needs to reach beyond Windows Auth)



So, to get around this, Microsoft has provided a workaround in the form of the CustomAuthentication example. It provides a basic way to authenticate using forms-based authentication with a login page. This does work, "technically". But this also will not work if we want our report authentication to be invisible/seamless to a user who is already authenticated to the main app (that provides SSRS-based reporting features).

Why make a user auth 2x? 

So.... (and this isn't incredibly clever but it's useful): enter this extension of the CustomAuthentication example that is Local-only access by default, but suggests several overrides for authentication.

Here is the crux of how the CustomAuth can work in many different ways simply via modifying Login.aspx.cs of the Microsoft example:


        private void Page_Load(object sender, EventArgs e)
        {
            //Your secret authentication sauce goes here..
            //appHash should get dynamically generated from the app calling SSRS (ideally for each request if performant enough)
            //ie. 
            //if (CheckAuth(System.Web.HttpContext.Current.Request.Cookies["origAppHash"].ToString()))
            //if (CheckAuth(System.Web.HttpContext.Current.Session["otroAppHash"].ToString()))
            if (System.Web.HttpContext.Current.Request.IsLocal)
                FormsAuthentication.RedirectFromLoginPage("daylite", true);
        }

        private bool CheckAuth(string appHash)
        {
            //DecodeAndCryptoChecks on appHash
            return true;
        }

 MS' Example uses Page_Load(); presumably Page_PreLoad() or Page_Init() would also work here- 'just an HttpRequest eval



The idea behind this branch (closed as soon as PR'd as I don't expect MS to integrate this but I did want to not-so-subtly nudge them to explain SSRS' custom extensibility better) is not a defined solution- it is to demonstrate how you can interface with the authentication and authorization operations of SSRS service to achieve virtually any kind of custom security behavior or compatibility that you require.




GitHub (my sonrai LLC is my contractor/consultant LLC): https://github.com/sonrai-LLC/ExtRSAuth

Locations - Google Maps API, ASP.NET Core and SQL Server

This app's function/purpose is to use Google Maps API to get geographic data and render locations on maps with editable pins (much like... many apps these days- it is kind of becoming an expectation for any application/service involving a location street address).

In this way you can record or plan the state(s) of an event or location at some particular street address. Or just have a geographic representation of some important locations that you can then print and have a custom map for.


This is a proof-of-concept app illustrating what you can do with a little JavaScript, a web app and the Google Maps API



The code below takes locations records (containing the lat/long of the geographic coordinate) from a database and then initializes the Google Map with some options (I omitted many for brevity). The main interesting thing the code does below, is when it renders the pins (addMarker() function) it adds an event listener to delegate the task of popping up an ASP.NET Core-bound edit modal when a user clicks the pin.

On the Add and Update side as far as mapping Lat/Long from Street, City, State- that is all handled by the incredibly useful GoogleLocationService provided as a Nuget package for .NET Core apps.

Other than that it is just standard JavaScript- Google Maps API does virtually all of the geocoding and map visualization heavy lifting.


The crux of the utilization of the API code (callback and map rendering) is this:
 <script>  
     function initMap() {  
       var map = new google.maps.Map(  
         document.getElementById('map'),  
         {  
           center: new google.maps.LatLng(@Model.CenterLat, @Model.CenterLong),  
           zoom: 8  
         }  
       );  
       var pins = @Html.Raw(Json.Serialize(@Model.Locations));  
       for (var i = 0; i < pins.length; i++) {  
         var myLatLng = {  
           lat: pins[i].lat,  
           lng: pins[i].long  
         };  
         addMarker(myLatLng, map, pins[i]);  
       }  
     }  
     function addMarkerAsync(location, map) {  
       new google.maps.Marker({  
         position: location,  
         title: 'Home Center',  
       });  
       marker.setMap(map);  
     }  
     function addMarker(location, map, pin) {  
       var marker = new google.maps.Marker({  
         position: location,  
         title: '...something dyanmic...',  
       });  
       var infowindow = new google.maps.InfoWindow({  
         content: ''  
       });  
       function AsyncDisplayString() {  
         $.ajax({  
           type: 'GET',  
           url: '/Home/GetLocationModalInfo',  
           dataType: "HTML",  
           contentType: 'application/json',  
           traditional: true,  
           data: pin,  
           success: function (result) {  
             debugger;  
             infowindow.setContent('<div style="background-color:#000000;">' + result + '</div>');  
             infowindow.open(map, marker);  
           },  
           error: function (arg) {  
             alert('Error');  
           }  
         });  
       }  
       google.maps.event.addListener(marker, 'click', function () {  
         AsyncDisplayString(map, marker)  
       });  
       marker.setMap(map);  
     }  
   </script>  


And then this Controller Action that uses GoogleLocationService to get coordinates by address:
 [HttpPost]  
     public IActionResult AddLocation(LocationModel location)  
     {  
       string address = location.StreetAddress1.Replace(" ", "+") + "," + location.City.Replace(" ", "+") + "," + location.State.Replace(" ", "+");  
       MapPoint coords = _locationService.GetLatLongFromAddress(address);  
       location.Lat = (decimal)coords.Latitude;  
       location.Long = (decimal)coords.Longitude;  
       using (var db = new SqlConnection(_configuration.GetConnectionString("DefaultConnection")))  
       {  
         db.Open();  
         string sql = @"INSERT INTO [Locations].[dbo].[Locations] ([Name], [Contact], [Email], [Website], [Phone], [StreetAddress1], [StreetAddress2], [City]"  
           + ",[State], [Zip], [LocationContact], [PrimaryContact], [Notes], [Type], [Lat], [Long], [Petitions], [Flyers], [Posters], [LastPickUpDateTime], [LastOutOfStockDateTime], LastDropoffDateTime"  
           + ",[AllTimeOutofStock],[Unsupportive],[VolunteerInterest])"  
           + " VALUES ('" + location.Name + "','" + location.Contact + "','" + location.Email + "','" + location.Website + "','" + location.Phone + "','" + location.StreetAddress1 + "','" + location.StreetAddress1 + "','" + location.City + "'"  
           + ",'" + location.State + "','" + location.Zip + "', -1, -1,'" + location.Notes + "', 1, " + location.Lat + "," + location.Long + "," + location.Petitions + "," + location.Flyers + "," + location.Posters + ",'" + location.LastPickUpDateTime + "','" + location.LastOutOfStockDateTime + "','" + location.LastDropoffDateTime + "', 0, 0, 1) " + ";";  
         db.Execute(sql);  
       }  
       var model = GetDefaultMapView();  
       model.KeyString = _configuration["MapsAPIKey"].ToString();  
       return View("Map", model);  
     }  


This is a proof-of-concept app illustrating what you can do with a little JavaScript, a web app and the Google Maps API


As you can see the Google Maps API provides a lot of opportunity for your application- don't underestimate the power of location-based data. With the tools at our disposal today the functionality of applications is being limited less by available algorithms/frameworks/tools- but rather, our imagination.


I strongly suggest you look into the ways you can integrate geographic/mapped data with Google Maps API; very powerful API







SSRS REST API v2

Here is a response from the SSSR REST API in action.. (you can access a lot more SSRS item properties and customize at will once you know the API)


The SSRS API v2 has far more functionality than v1, but they essentially work the same. You must be authenticated to the SSRS report server you are targeting (localhost in this case) to make web GET/POST requests to the API.

Once auth'd you can push and pull any useful SSRS data pretty easily to make SSRS do some pretty cool things it can't do out of the box..


This is the SSRS API as accessed through a web browser; simply give your .NET app an HttpClient and you can make use of all these responses; it's just JSON...



You can get a collection of SSRS catalog items as in the example above (folders, reports, KPIs) by just specifying the action name, or you can select an individual item by putting the item GUID in parenthesis in the API request URL:


You can access individual items in the API via GUID in parens after the API action name.




Common Useful SSRS API v2 Actions:
  • Reports
  • Datasets
  • Data Sources
  • Folders
  • Schedules
  • Subscriptions
  • Comments
  • KPIs
  • CatalogItems (everything)



Example of a .NET Standard library with an HttpService abstacting the SSRS API calls:
 namespace ExtRS  
 {  
   public class SSRSHttpService  
   {  
     const string ssrsApiURI = "https://localhost/reports/api/v2.0";  
     HttpClient client = new HttpClient(new HttpClientHandler() { UseDefaultCredentials = true });  
         public async Task<GenericItem> GetReportAsync(Guid id)  
     {  
       client.BaseAddress = new Uri(ssrsApiURI + string.Format("/reports({0})", id));  
       var response = await client.GetAsync(client.BaseAddress);  
       var odata = response.Content.ReadAsStringAsync().Result;  
       return JsonConvert.DeserializeObject<GenericItem>(odata);  
     }  
   }  
 }  
This is verbose to better break down the steps of what is happening on the ExtRS service end




A very basic class designed to demonstrate using SSRS API Response to create a .NET object:
 using Newtonsoft.Json;  
 using System.Collections.Generic;  
 namespace ExtRS  
 {  
   public class GenericItem  
   {  
     [JsonProperty("@odata.context")]  
     public string ODataContext { get; set; }  
     [JsonProperty("Id")]  
     public string Id { get; set; }  
     [JsonProperty("Name")]  
     public string Name { get; set; }  
     [JsonProperty("Path")]  
     public string Path { get; set; }  
   }  
 }  
The power of the SSRS API is limited primarily your imagination- lots of customization can be made




And finally, called from a Controller Action in an MVC app:
 using System;  
 using System.Web.Mvc;  
 using System.Threading.Tasks;  
 using ExtRS;  
 namespace Daylite.Controllers  
 {  
   public class ReportsController : Controller  
   {  
     public SSRSHttpService service = new SSRSHttpService();  
     public async Task<ViewResult> GetReportsAsync()  
     {  
       return View("Index", await service.GetReportsAsync());  
     }  
     public async Task<ViewResult> GetFoldersAsync()  
     {  
       APIGenericItemsResponse result = await service.GetFoldersAsync();  
       return View("Index", result);  
     }  
     public async Task<ViewResult> GetReportAsync(Guid id)  
     {  
       GenericItem result = await service.GetReportAsync(id);  
       return View("Index", result);  
     }  
   }  
 }  


Reference: https://github.com/Microsoft/Reporting-Services/tree/master/APISamples


Why NoSQL is Never Going to Replace SQL (apple:orange)

Do you need a jet engine to get your results from point A to B or will a modest GM sedan suffice?

I'll take the above bad analogy further and posit that while sedans and cars on the ground require stringent rules and have to navigate much more rigid structures, a jet engine simply powers the jet ahead through constraint-less skies- it's purpose is to power something big- not to be concerned with other machinery of the craft (ie. RDBMS features eschewed by NoSQL solutions).

Do you need massive global data sync scale so that millions can connect and make changes and the results all (appear) real-time? If not, NoSQL is not always the right choice and neglecting to have any kind of schema for stored application data structures can present its own host of challenges in the future if (when) those structures change. But alas, you can use NoSQL for some things (Redis, image/BLOB storage) and an RDBMS for others (more structured records and things you want to restore to a point in time in the event of a server failure).

Relational databases tend to be scaled up; NoSQL solutions are scaled out

The SQL vs NoSQL (structured and transactional vs. semi-structured and "eventually consistent") debate is not a matter of one or the other and that's that. These are complementary technologies and should both be used- wherever you find app requirements that suggest one or the other makes the most sense.

When an application is meant to scale immensely and there is not a lot of data integrity, consistency, transaction or complex data structuring and transformation needs- NoSQL is your best bet and will far outscale even the most robust RDBMS server farm- at least at a much lower cost (at the cost of sacrificing features of an RDBMS which may not be needed).

I have personally worked on several projects that utilize relational and unstructured approaches to reading and persisting application data. If you have ever used an application's config file to change a setting in JSON or XML or a simple line entry- you are seeing a small and very basic NoSQL example of storing app data.

Hadoop and other distributed NoSQL db servers are built for scalability


Using NoSQL in software development can make data structures and objects- passed to and fro from APIs and within the application itself- much more flexible to work with. 1-line to serialize an object to JSON chunk, save it to BLOB storage and forget about it.

When dealing with relational data, you really have to understand the data to write good data access code and the underlying SQL that supports well-defined structuring of complex objects.

Well-defined structuring of the persistence of complex application objects avoids data duplication/corruption, prevents breaking reference constraints and losing any sense of hierarchical data relationships and more generally lets you know very quickly when you have a problem within your data storage structures and the objects that initialize themselves from that data.

NoSQL ditches virtually all relational database data normalization rules in favor of a loosely schema'd unstructured (document, BLOB, KeyStore, etc.) data store that relies solely on keys, values and filtering unstructured metadata to get the same SELECT ... WHERE functionality found in RDBMS. Its iterations usually bear a resemblance to Java and as loosely follows here are common SQL statements and their Java or Java-derived equivalent:


"Apache Hadoop is an open source platform built on two technologies: Linux operating system and Java programming language."


For many application data requirements however, relational data can be overkill and totally unnecessary (ie. Redis to store key/vals vs. designing some elaborate key/val store in a SQL Server table).


Implementing something like Splunk or Kibana to continously index app logs and configuration files can help you dip toes into the Lake of Dark Data


The best distributed NoSQL solutions like Hadoop really shine in their inherit ability to dynamically scale to as many server machines as the operators can make ready to serve as "Hadoop processor nodes on standby".

SQL Server scaling is based more on server augmenting or "scaling up" (adding RAM, faster SSDs, RAID Arrays, etc.) rather than distributing workloads across dynamic nodes. SQL Server AlwaysOn Availability and its Mirroring and Replication feature are for recoverability and data sharing- not dynamic scaling to handle bigger and bigger workloads.

SQL has been around forever. The fundamental concept behind NoSQL (semi-structured or loosely structured data) has been around since long before SQL relational database technology. Both (along with NoSQL-related graph database paradigm) will continue to serve as viable data storage solution alternatives for many more years into the 21st century.


Relational systems like SQL Server, MySQL and Oracle usually handle structured side; NoSQL vendors are after the other 90%


In fact, SQL Server 2019's Polybase extension supports Hadoop Clusters, MongoDB and Terradata T-SQL query integration. A new feature called SQL Server Big Data Clusters helps make distributed NoSQL nodes manageable within SSMS environment.

Mongo, Hadoop and other NoSQL database servers have SQL server integration to support relational data sources.

SQL Server 2019 Polybase integrates Hadoop, MongoDB and many other sources with relational data and T-SQL queries


CAP Theorem: a distributed data system like most all NoSQL solutions can only achieve 2 of the 3 features: "Consistency", "Availability" and "Partition Tolerance"

ACID vs BASE:  The relational axiom of "Atomic, Consistent, Isolated, Durable" contrasted against NoSQL's vague promise of "Basically Availability, Soft State, Eventual Consistency" (dirty reads common)

This ol' tried-and-true database server software ain't going away in the foreseeable future


Hundreds of millions of corporate, mid and small business applications are running along just fine in 2019 using various RDBMS platforms (SQL Server, Oracle, DB2, PostgreSQL, MySQL, etc.) for at least one of their data stores.

Many more millions of applications have been using one riff or another of NoSQL (semi-structured data) before, during and after the mythical "Relational Movement" as described by software veteran Robin Bloor:

"The Relational Model of Data Never Dominated Anyway. Estimates vary, but it is generally agreed that somewhere between 70% and 95% of the world’s data is stored only in poorly structured or unstructured formats such as: word processing documents, spreadsheets, HTML files and e-mail. The truth is that Relational database never did really dominate. It was rejected out of hand, year after year, as an effective store for many types of data." -Robin Bloor on insideanalysis.com


Google search trends over the last 5yrs certainly suggest relational SQL is not going anywhere anytime soon...


Considerations when evaluating whether to use NoSQL:
  • NoSQL is a precise tool for precise data needs; if relational SQL is too much for your group, NoSQL will likely be too steep a learning curve
  • Data Integrity- when billions of NoSQL records are affected by a small change in schema that is not able to propagate correctly or runs into constraint issues or hierachy and relations are impossible to infer... maybe relational SQL would be a better approach
  • NoSQL touts loose schema structure is a benefit but this simply means schema and data structure enforcement has been shifted from the database layer to the application layer. Data cannot "self-manage".
  • Some apps are prime candidates for NoSQL's document-centric and resource-centric distributed storage architecture


Also, there is this to consider:


(re: the longevity and simple-yet-powerful abstractions of SQL)


If NoSQL solutions are eventually able to achieve the same transactional consistency and complex schema structures that some applications require and then ultimately subsume RDBMS completely- it'll still require a lot of SQL gurus to convert and integrate all the legacy relational database apps for a long, long time to come...

Bring on MongoDB, CouchDB, Dynamo, MapReduce, HBase, BigTable, Cassandra.


As data professionals we will have an increasingly complex array of tools to understand; what we do with them will drive the future



Long live SQL Server 2030. 😉




References:

https://blog.timescale.com/why-sql-beating-nosql-what-this-means-for-future-of-data-time-series-database-348b777b847a/

https://www.wired.com/2012/12/couchdb/

https://pdfs.semanticscholar.org/a6f0/1c9103d3bafb8ce92641c9f2a4deaccd12f9.pdf

https://www.memsql.com/blog/why-nosql-databases-wrong-tool-for-modern-application/

https://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:1589423200346982646

https://news.ycombinator.com/item?id=479165

https://insideanalysis.com/is-the-relational-database-doomed/

SQL CLR for .NET in SQL Server

You may find yourself with the need to integrate a .NET method within SQL Server to be called as a function. This usually happens when some relatively complex looping and modifying logic is a requirement of a SQL operation.

SQL is a great data language but it is not the right language for some tasks. Creating a SQL CLR from a .NET assembly may be the best approach to some unique situations (and there is a bonus in that, in many cases you can reuse existing .NET code).

Before creating the CLR object we need a .NET .dll; so first we create a basic .NET assembly compile in Release and copy the path the the compiled .dll:

This is our simple .NET CLR method with which we want to run within the SQL Server query execution engine


SQL CLR provides a way for you to integrate complex .NET methods within SQL Server


Import into SQL Server instance via SSMS*: 


Select New Assembly... 




...and then enter the path to your Release .dll


Create T-SQL function or stored procedure to serve as caller for the function and run it:

From here we can see all of the T-SQL code involved; the 3 SQL Server configuration conditions (shown in the 3 EXEC statements) are required

And that is all there is to it. Only use CLR functions when absolutely necessary as RDBMS's like SQL Server are designed to processes relational data in sets, and not to apply complex business logic on individual rows.

But if there is no other way- SQL CLRs could provide you a solution to your code/logic integration problems.


*Warning and Reference: https://blog.netspi.com/attacking-sql-server-clr-assemblies/

Accessing SQL Server Data in R

Importing SQL Server data into R for analysis is pretty straightforward and simple. You will obviously need R installed on your machine. The following R code will connect to your SQL Server database (using R Studio):

 library(RODBC)  
 dbconnection <- odbcDriverConnect('driver={SQL Server};server=.;database=CLARO;trusted_connection=true')  
 initdata <- sqlQuery(dbconnection,paste('SELECT * FROM [CLARO].[dbo].[Fielding];')) 


SELECT data from a SQL Server database output in R Studio


Accessing SQL Server Data in Python

So you want to access Microsoft SQL Server from your Python script(s)?

After weeding out some long-abandoned and/or nonworking solutions, I discovered a very simple Python ODBC driver that works with virtually all SQL Servers since MSSQL 2005 called "pyodbc". 

First, you will need to install this MSSQL ODBC (13.1 or 17 should work) component on your machine in addition to installing the pyodbc driver.

Next, get the pyodbc module for Python by running this from Windows command prompt:

pip install pyodbc

Then open up a python shell using 'py' or 'python' and enter the following after editing configuration values to match your development environment:

 import pyodbc  
 cnxn = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};SERVER=localhost;DATABASE=WideWorldImporters;UID=DemoUser;PWD=123Password')  
 cursor = cnxn.cursor()  
 #Sample of a simple SELECT  
 cursor.execute("SELECT TOP (100) Comments, count(*) FROM WideWorldImporters.Sales.Orders GROUP BY Comments")  
 row = cursor.fetchone()   
 while row:   
   print(row[0] + ': ' + str(row[1]))  
   row = cursor.fetchone()  

Running this code will result in the below if you have configured everything correctly (note this example makes use of the Microsoft SQL Server demo WorldWideImporters database):


Reference: https://docs.microsoft.com/en-us/sql/connect/python/pyodbc/step-3-proof-of-concept-connecting-to-sql-using-pyodbc?view=sql-server-2017

OLAP: Facts and Dimensions

OLAP can adequately be described as the storage of redundant copies of transactional records from OLTP and other database sources. This redundancy facilitates quick lookups for complex data analysis because data can be found via more and quicker (SQL execution) paths than normalized OLTP. And OLTP after all- should stick to its namesake and what it does best: processing, auditing, and backing up online transactions, no? Leave data analysis to separate OLAP Warehouses.

OLAP data is typically stored in large blocks of redundant data (the data is organized in ways to optimize and accelerate your data mining computations). Measures are derived from the records in the fact table and dimensions are derived from the dimension tables.


Facts are "measurements, metrics or facts of a particular process" -Wikipedia. Facts are the measurement of the record value: "$34.42 for today's closing stock price", "5,976 oranges sold at the market", "package delivered at 4.57pm", etc.

Dimensions are "lists of related names–known as Members–all of which belong to a similar category in the user’s perception of a data". For example, months and quarters; cities, regions, countries, product line, domain business process metrics, etc.

Dimensions give you a set of things that you can measure and visualize in order to get a better pulse and better overall understanding of the current, past and even potential future (via regression models) shape of your data- which can often alert you to things you might not be able to see (certainly not as quickly or easily) in an OLTP-based Data Analysis model which is often tied to production tables out of necessity or "we don't have time or money to implement a Data Warehouse".


Yes, you definitely do need OLAP/DW capabilities if you think long-term about it. Having intimate knowledge of your operations and how you can change variables to better run your business? Why would any business person (excepting those committing fraud) not want that?

I'd say that implementing a true and effective OLAP environment is worth any project investment and would pay itself over and again in the way of better and more specific/actionable metrics that help administrators of operations make the best, data-backed decisions- some very critical decisions involving millions of dollars and sometimes lives. I'd like a better look at the data before making a multi-million dollar or life/death decision.


SAS, Hadoop, SSAS, with healthy doses of R and/or Python customization?- whatever data solution you choose, my advice is to go with something that has a great track record of providing the kind of solutions that your business and/or your industry require. Use the tool that your ETL developers can utilize effectively and that helps you to best meet the needs of your constituents with whom you will exchange data with (the company, the customer, industry and/or regional regulation compliance).

Fire Drills to Ensure Successful Disaster Recovery

TEST. YOUR. DISASTER. RECOVERY. PLAN.

This cannot be said enough. I have heard countless horror stories (and lived 1) of instances where backups were completing "successfully" but no one ever tested the restoration of those backups (which were, in fact, corrupt and not restore-able)- until it was too late. :/


Put in the time up front to make sure your data recovery strategy and high-availability guarantee actually "recover" and make "available" your data- the lifeblood of your organization.

It is important to remember that you can have a Secondary database server that is 100% in sync with the Primary database server, but if your application is not configured to make the switch (or your service host does not make the switch for you)- you will only have 1/2 of things recovered: the data, but not the data availability through the application.

Good References on Disaster Recovery Strategies and what to consider ("sensitivity of data, data loss tolerance, required availability, etc."):

https://www.sqlshack.com/sql-server-disaster-recovery/

https://support.microsoft.com/en-us/help/822400/description-of-disaster-recovery-options-for-microsoft-sql-server


ETL and the Case Against a Single System for All Business Processes

Nearly every business wants to centralize their business data so that they can get an at-a-glance overview of what is going on at any given point in the day. Suppose, for instance, a company has its business data on 5 separate database servers, in various databases therein, across several different tables. And much of the data will need to be slightly modified (transformed) in order to make data types consistent and optimize it for quick and meaningful data analysis.

Assume the database servers are:
  • CMS, Orders (Dynamics 365 data)
  • Accounting (Sage 50 data)
  • Warehouse, Logistics (SAP data)
  • Field Service, Product Support (SalesForce data)
  • Call Center (TantaComm data)
Furthermore, the business wants to apply some custom logic to various fields so that certain calculations don't have to be made by the OLAP end users and applications.

ETL is just moving data from a Source to a Destination, often making some changes (transformations) along the way

Well, it all seems so complicated, right? How can all that disparate data be combined and standardized? It's not as complicated as allowing an ERP solution define how your various departments do their job. Some ERP products truly do have the functionality to satisfy 80-85% of the company business requirements. And when that is the case, by all means, companies should go that route.

But for instances where ERP of every business function is not working, you can use products like SSIS, Informatica, and Oracle Data Integrator to easily map ETL data visually and apply code operations to the data using a variety of programming languages.

But of course, there is no "one-size-fits-all 100% of our business needs" ERP solution; different business productivity software work better or worse for different industries, different departmental needs, for all manner of different reasons. Why care about the choice of business tool that your business' management and best minds know and have proven is most effective for their- often very specific- job?

Allowing a business to integrate the "best apps for each particular type of job" is nothing more than a sort of microservices architecture. But if we are working with custom code in these various LOB applications (at least some of the LOB apps will need custom API integrations so that they can talk to each other), with each new block of custom code, comes more decentralized code bases and decentralized test scripts...

All considered, I would argue that trying to put square pegs into round holes a la "ERP-everything" is a much, much, MUCH bigger effort than a well-developed and tested scheduled ETL Data Warehouse refresh process to centralize your company's data.

It's always a balance with software. Use the most effective apps for your organization and departmental requirements, but try to avoid the dependency hell that can come along with a poorly managed distributed application environment.

Here are the ETL steps in order, from the resource link below:

Extract the data from the external data sources such as line-of-business systems, CRM systems, relational databases, web services, and anywhere you have business data you want to summarize.

Transform the data. This includes cleansing the data and converting it to an OLAP-friendly data model. The OLAP-friendly data model traditionally consists of dimension and fact tables in a star or snowflake schema.

Load the data so that it can be quickly accessed by querying tools such as reports, dashboards, etc.

"90% of the requirements now is better than 100% of the requirements never" -Unknown

Reference: https://blogs.msdn.microsoft.com/andreasderuiter/2012/12/05/designing-an-etl-process-with-ssis-two-approaches-to-extracting-and-transforming-data/

Log Shipping and Log (and/or Differential Backup) Restoration

Most developers know what a full database backup is. Did you know a differential backup is just a subset of the full backup? But what is the point of transaction log shipping in SQL Server?

A "transaction log" is just a subset of a subset of a full backup containing a set of all T-SQL transactions for a given (usually short span) amount of time; they are typically set at intervals of every few minutes but this varies according to application and data compliance needs.

A commenter in the referenced web link below aptly notes:

"the point of log shipping is to keep the secondary server closer to 100% synced than the full backups (which are much larger and produced much less frequently than log backups)."

Log shipping can be used to serve as a constant "warm backup" (similar to SQL Server database mirroring) of all production transactions for a standby/failover SQL Server* as well as a dedicated Backup SQL Server as depicted here:

 Setup if you have a fully-functioning Secondary Production SQL Server

Or, more commonly, the shipped backup logs are sent to a single Backup Server:

Typical production database server scenario, though the steps are a bit out of sequential order, if not (or due to) visual order

In short, a full backup contains the entire database you to recover- up to a certain point in time.

A differential backup is a subset of a backup (generally "full" backups contain days worth of data for bigger organizations)- it would only be prudent to include one or even four differential backups to ensure quick disaster recovery (and full recovery of vital customer data).

Backup logs are chunks of SQL transactions that eventually compose the differential backup(s) and finally the entire full .bak file.

These 3 disaster recovery tools that SQL Server affords DBAs and developers allow you to restore your production data back to the point in time when your server failed:

  • Apply the most recent full backup of the prod server that went down*
  • Apply any differential backups since the most recent full backup
  • Apply all transaction logs up to the point in time of the failure**
  • Apply all StandBy server transactions logs that may have occurred (if not ReadIOnly)

With all of this said, do be aware that different scenarios and requirements call for different kind of disaster recovery solutions. Consult with the experts and never neglect to TEST YOUR RESTORE PROCEDURE WITH ACTUAL .bak and .trn files- and then test the applications and services connected to those database servers against their restored/Secondary version to ensure you suffer as little downtime as possible in the event of a database emergency.

What a restoration timeline might look like

*Note(!) to the questions on Microsoft Licensing and your StandBy server, should it need to be used for any production activity during your disaster recovery:

"I am not the licensing police, and I am not Microsoft – check with your licensing representative to clarify your exact situation. Generally, you can have one warm standby server. However, the second someone starts using it for reporting, testing, or anything else, you need to license it like any other server."-Brent Ozar

**"If the database is in Standby mode, users can query it – except when a log backup is being restored. You need to decide if a restore job will disconnect users, or if the restore is delayed until after the users are disconnected." -Brent Ozar

References:

https://www.sqlshack.com/sql-server-disaster-recovery/ (SQL Server Replication is another compliment or alternative to the log shipping and backup/restore)

 https://www.brentozar.com/archive/2013/03/log-shipping-faq/