Showing posts with label Database. Show all posts
Showing posts with label Database. Show all posts

Custom SSRS Authentication Extended

One of the limitations of SSRS is that it cannot be used outside of a Windows environment (security is all dependent on Active Directory and Windows User accounts). Unfortunately app impersonation does not work cleanly enough as it will prompt users for credentials when they 1st authenticate to the Report Server. That doesn't cut the mustard for user expectations of public-facing apps.



Authentication only works as far its interoperability can reach (needs to reach beyond Windows Auth)



So, to get around this, Microsoft has provided a workaround in the form of the CustomAuthentication example. It provides a basic way to authenticate using forms-based authentication with a login page. This does work, "technically". But this also will not work if we want our report authentication to be invisible/seamless to a user who is already authenticated to the main app (that provides SSRS-based reporting features).

Why make a user auth 2x? 

So.... (and this isn't incredibly clever but it's useful): enter this extension of the CustomAuthentication example that is Local-only access by default, but suggests several overrides for authentication.

Here is the crux of how the CustomAuth can work in many different ways simply via modifying Login.aspx.cs of the Microsoft example:


        private void Page_Load(object sender, EventArgs e)
        {
            //Your secret authentication sauce goes here..
            //appHash should get dynamically generated from the app calling SSRS (ideally for each request if performant enough)
            //ie. 
            //if (CheckAuth(System.Web.HttpContext.Current.Request.Cookies["origAppHash"].ToString()))
            //if (CheckAuth(System.Web.HttpContext.Current.Session["otroAppHash"].ToString()))
            if (System.Web.HttpContext.Current.Request.IsLocal)
                FormsAuthentication.RedirectFromLoginPage("daylite", true);
        }

        private bool CheckAuth(string appHash)
        {
            //DecodeAndCryptoChecks on appHash
            return true;
        }

 MS' Example uses Page_Load(); presumably Page_PreLoad() or Page_Init() would also work here- 'just an HttpRequest eval



The idea behind this branch (closed as soon as PR'd as I don't expect MS to integrate this but I did want to not-so-subtly nudge them to explain SSRS' custom extensibility better) is not a defined solution- it is to demonstrate how you can interface with the authentication and authorization operations of SSRS service to achieve virtually any kind of custom security behavior or compatibility that you require.




GitHub (my sonrai LLC is my contractor/consultant LLC): https://github.com/sonrai-LLC/ExtRSAuth

Dapper for .NET Data Access

Jeff Atwood described the phrase coined by Ted Neward that "Object-Relational Mapping is the Vietnam of Computer Science".

I agree with everything except Atwood's (huh? 😨- keep in mind this is 2006) conclusion that we should do one or the other: objects or relational data records. Develop apps as a series of SQL data access statements assigning values to arbitrary pieces of monolith application code.. or exclusively object-oriented with everything saved to blob storage... Or something awkward like that.

That, he says (in the 2006 article*), removes the O (object) - R (relational data) mapping problem entirely. It sure does; but how can we develop apps like that?

(fast-forward 6yrs, and.... Dapper to the rescue!)

Dapper is an awesome (IMO) alternative that allows developers to retain SOLID reuse and extensibility in their .NET data access code while still accessing complex relational data- and fast.


Dapper has the best of both worlds in terms of what you look for in a data access framework - speed and clean, easy SQL-to-typed object mapping facilitation


I highly recommend the brief peruse; it is a very interesting article. It essentially describes the pitfalls that ADO.NET, Hibernate and Entity SQL (EF for MSSQL) and so many of the other approaches to modeling relational data as .NET objects that have, if not failed completely- severely been lacking especially in terms of speed and control over the actual SQL that you instruct the SQL engine to execute.

Dapper aims to bridge the eternal gap between application and relational database code in a pretty elegant way for .NET development. So long as your database records (whether from a complex JOIN'd SP or wherever in your db)- can return data with types and field/alias names that match your "query-return-target-type" class' properties' names and data types, you are set for all the kinds of data access you like and are off and running without all the headaches normally associated with ORMs (magic config strings, mappings in separate files out of sync with class or db changes, etc.).

"there is no good solution to the object/relational mapping problem. There are solutions, sure, but they all involve serious, painful tradeoffs. And the worst part is that you can't usually see the consequences of these tradeoffs until much later in the development cycle." -Jeff Atwood on ORMs

I guess you could say that the SQL itself in the queries you tell Dapper to issue to MSSQL are "magic strings" insofar as VS doesn't compile them.. But if you don't use SSMS to parse and execute tests of your queries before using them in application code then you aren't really doing real data development- you are just shooting in the dark.

You should have unit tests for this very purpose. Unit tests of your Dapper calls will catch any db changes in the tests ("hey why did nobody tell me about this schema change in the Archives table?"); regardless- if your SQL field names don't match the class prop names of the object you are trying to "Dapperize"- you will find out at run-time. The exception messages are very "straight to the point of exactly what is off".

Dapper works the same in all versions of .NET; it is currently based on .NET Standard for that very reason, but you will need to bring in more dependency depending on what type of data source you are trying to access (SQL Server, MySQL, Oracle, DB2, Terradata, etc.).

Consider giving Dapper a try - it is very useful and illuminating, and it really shines in the very areas where EF falls short.



Dapper accessing 'UserReport' records from SQL db and returning the dynamic, typed object:
     SqlConnection db = new SqlConnection(WebConfigurationManager.AppSettings["DefaultSQLConnection"]);  
     public List<UserReport> ReadAllSavedUserReports()  
     {  
       using (db)  
       {  
         return db.Query<Report>("SELECT * FROM CLARO.dbo.UserReport").ToList();  
       }  
     }  
     public UserReport FindSavedUserReport(int id)  
     {  
       using (db)  
       {  
         return db.Query<Report>("SELECT * FROM CLARO.dbo.UserReport WHERE Id = @Id", new { id }).SingleOrDefault();  
       }  
     }  
Forgive the "SELECT *.... this is just a demonstration..



These methods can then easily be called in controller or other code like so:
     public ViewResult Index()  
     {  
       string nowTime = DateTime.Now.ToShortDateString();  
       ReportDAL dal = new ReportDAL();  
       Demo model = BuildModel(BuildSQLStatement(nowTime, ReportDrafts.BaseballDemo), nowTime);  
       model.Reports = dal.ReadAllSavedUserReports();  
       return View(model);  
     }  


Dapper is not a company trying to sell anything- it is just a really useful micro-ORM for those who prefer to work more hands-on with the SQL in data access code (and like to be able to more granularily control optimization for speedier queries).

*Atwood helped contribute (with SO) to the development of Dapper, so... I think he and that team kinda nailed the removal and easing of the very same limitations he bemoaned in the article I reference at the beginning: https://stackoverflow.blog/2012/02/18/stack-exchange-open-source-projects/


References: 

https://elanderson.net/2019/02/asp-net-core-with-dapper/

https://dapper-tutorial.net/



NLog for Flexible Application Logging in .NET

As much as I have enjoyed working with log4net in the psat NLog works much more seamlessly for .NET apps (it has good, solid abstractions)

From Nuget Package Manager, you can find and reference NLog and the accompanying NLog.Config which simplifies setup. To configure NLog logging, simply point the required properties (you need at least a logFile variable, a target and a rule to get started) to your desired values in the config file that NLog.Config creates in your project root  (NLog.config):

Configuration:

 <?xml version="1.0" encoding="utf-8" ?>  
 <nlog xmlns="http://www.nlog-project.org/schemas/NLog.xsd"  
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"  
    xsi:schemaLocation="http://www.nlog-project.org/schemas/NLog.xsd NLog.xsd"  
    autoReload="true"  
    throwExceptions="false"  
    internalLogLevel="Off" internalLogFile="c:\temp\nlog-internal.log">  
  <variable name="logFilePath" value="C:\NLog\IDG.log" />  
  <targets>  
   <target name="logfile"  
     xsi:type="File"  
     fileName="${logFilePath}"  
     layout="${longdate}  LEVEL=${level:upperCase=true}: ${message}"  
     keepFileOpen="true" />  
  </targets>  
  <rules>  
   <logger name="*" minlevel="Info" writeTo="logfile" />  
   <logger name="*" minlevel="Warn" writeTo="logFile"/>  
  </rules>  
 </nlog>  
This example uses a log file target in C:\NLog directory; you can utilize a wide variety of logging targets to broadcast app errors

Source Code:

 using Microsoft.VisualStudio.TestTools.UnitTesting;  
 using NLog;  
 namespace ExtRSTests  
 {  
   [TestClass]  
   public class NLogTests  
   {  
     private static Logger logger = LogManager.GetCurrentClassLogger();  
     [TestMethod]  
     public void TestLoggerToFile()  
     {  
       logger.Warn("Something in the app happened that may indicate trouble ahead....");  
       logger.Error("Uh-oh. Something broke.");  
     }  
   }  
 }  
With the logging levels, log formatting (timestamps) and abundant integration options, NLog is a complete logging solution

With NLogger you can implement logging for an array of targets including file, email, database and 3rd party integrations (ie. send message to Slack channel if logger generates any "Error" or "Fatal" level log message).


Reference: https://www.infoworld.com/article/3144535/how-to-work-with-nlog-in-net.html

Visualize Hashing and Salt as Part of Password Encryption Process

The image below is a simplified and easy-to-understand illustration of how hashing and salting work. The main takeaway from this post- multiple users can have the same password, but will all have different salt values, thus making their hash result value different, and when you authenticate, you authenticate by the hash result value of your passwords, which is virtually always going to be unique for each user record:

Simple, no?

Even in the case of 2 users having the same hash result, the usernames will/should not be the same, so you still have distinct accounts, because UserID is also checked in the authentication process.

Companies increasingly (and for good data privacy reasons) do not even store the clear text textbox value you enter when you sign up for and then log into Fb, Google, Amazon, etc- they check your entered password's hash result against the hash result they have for your user/account record either from when you registered or last changed your password.

Good answer to the question you may come across, "what is the difference between salt and an IV (initialization vector)?" (TL;DR: not all IV's are salt, but salt is a kind of IV): https://security.stackexchange.com/questions/6058/is-real-salt-the-same-as-initialization-vectors


OLAP: Facts and Dimensions

OLAP can adequately be described as the storage of redundant copies of transactional records from OLTP and other database sources. This redundancy facilitates quick lookups for complex data analysis because data can be found via more and quicker (SQL execution) paths than normalized OLTP. And OLTP after all- should stick to its namesake and what it does best: processing, auditing, and backing up online transactions, no? Leave data analysis to separate OLAP Warehouses.

OLAP data is typically stored in large blocks of redundant data (the data is organized in ways to optimize and accelerate your data mining computations). Measures are derived from the records in the fact table and dimensions are derived from the dimension tables.


Facts are "measurements, metrics or facts of a particular process" -Wikipedia. Facts are the measurement of the record value: "$34.42 for today's closing stock price", "5,976 oranges sold at the market", "package delivered at 4.57pm", etc.

Dimensions are "lists of related names–known as Members–all of which belong to a similar category in the user’s perception of a data". For example, months and quarters; cities, regions, countries, product line, domain business process metrics, etc.

Dimensions give you a set of things that you can measure and visualize in order to get a better pulse and better overall understanding of the current, past and even potential future (via regression models) shape of your data- which can often alert you to things you might not be able to see (certainly not as quickly or easily) in an OLTP-based Data Analysis model which is often tied to production tables out of necessity or "we don't have time or money to implement a Data Warehouse".


Yes, you definitely do need OLAP/DW capabilities if you think long-term about it. Having intimate knowledge of your operations and how you can change variables to better run your business? Why would any business person (excepting those committing fraud) not want that?

I'd say that implementing a true and effective OLAP environment is worth any project investment and would pay itself over and again in the way of better and more specific/actionable metrics that help administrators of operations make the best, data-backed decisions- some very critical decisions involving millions of dollars and sometimes lives. I'd like a better look at the data before making a multi-million dollar or life/death decision.


SAS, Hadoop, SSAS, with healthy doses of R and/or Python customization?- whatever data solution you choose, my advice is to go with something that has a great track record of providing the kind of solutions that your business and/or your industry require. Use the tool that your ETL developers can utilize effectively and that helps you to best meet the needs of your constituents with whom you will exchange data with (the company, the customer, industry and/or regional regulation compliance).

Fire Drills to Ensure Successful Disaster Recovery

TEST. YOUR. DISASTER. RECOVERY. PLAN.

This cannot be said enough. I have heard countless horror stories (and lived 1) of instances where backups were completing "successfully" but no one ever tested the restoration of those backups (which were, in fact, corrupt and not restore-able)- until it was too late. :/


Put in the time up front to make sure your data recovery strategy and high-availability guarantee actually "recover" and make "available" your data- the lifeblood of your organization.

It is important to remember that you can have a Secondary database server that is 100% in sync with the Primary database server, but if your application is not configured to make the switch (or your service host does not make the switch for you)- you will only have 1/2 of things recovered: the data, but not the data availability through the application.

Good References on Disaster Recovery Strategies and what to consider ("sensitivity of data, data loss tolerance, required availability, etc."):

https://www.sqlshack.com/sql-server-disaster-recovery/

https://support.microsoft.com/en-us/help/822400/description-of-disaster-recovery-options-for-microsoft-sql-server


Log Shipping and Log (and/or Differential Backup) Restoration

Most developers know what a full database backup is. Did you know a differential backup is just a subset of the full backup? But what is the point of transaction log shipping in SQL Server?

A "transaction log" is just a subset of a subset of a full backup containing a set of all T-SQL transactions for a given (usually short span) amount of time; they are typically set at intervals of every few minutes but this varies according to application and data compliance needs.

A commenter in the referenced web link below aptly notes:

"the point of log shipping is to keep the secondary server closer to 100% synced than the full backups (which are much larger and produced much less frequently than log backups)."

Log shipping can be used to serve as a constant "warm backup" (similar to SQL Server database mirroring) of all production transactions for a standby/failover SQL Server* as well as a dedicated Backup SQL Server as depicted here:

 Setup if you have a fully-functioning Secondary Production SQL Server

Or, more commonly, the shipped backup logs are sent to a single Backup Server:

Typical production database server scenario, though the steps are a bit out of sequential order, if not (or due to) visual order

In short, a full backup contains the entire database you to recover- up to a certain point in time.

A differential backup is a subset of a backup (generally "full" backups contain days worth of data for bigger organizations)- it would only be prudent to include one or even four differential backups to ensure quick disaster recovery (and full recovery of vital customer data).

Backup logs are chunks of SQL transactions that eventually compose the differential backup(s) and finally the entire full .bak file.

These 3 disaster recovery tools that SQL Server affords DBAs and developers allow you to restore your production data back to the point in time when your server failed:

  • Apply the most recent full backup of the prod server that went down*
  • Apply any differential backups since the most recent full backup
  • Apply all transaction logs up to the point in time of the failure**
  • Apply all StandBy server transactions logs that may have occurred (if not ReadIOnly)

With all of this said, do be aware that different scenarios and requirements call for different kind of disaster recovery solutions. Consult with the experts and never neglect to TEST YOUR RESTORE PROCEDURE WITH ACTUAL .bak and .trn files- and then test the applications and services connected to those database servers against their restored/Secondary version to ensure you suffer as little downtime as possible in the event of a database emergency.

What a restoration timeline might look like

*Note(!) to the questions on Microsoft Licensing and your StandBy server, should it need to be used for any production activity during your disaster recovery:

"I am not the licensing police, and I am not Microsoft – check with your licensing representative to clarify your exact situation. Generally, you can have one warm standby server. However, the second someone starts using it for reporting, testing, or anything else, you need to license it like any other server."-Brent Ozar

**"If the database is in Standby mode, users can query it – except when a log backup is being restored. You need to decide if a restore job will disconnect users, or if the restore is delayed until after the users are disconnected." -Brent Ozar

References:

https://www.sqlshack.com/sql-server-disaster-recovery/ (SQL Server Replication is another compliment or alternative to the log shipping and backup/restore)

 https://www.brentozar.com/archive/2013/03/log-shipping-faq/

Temporal Tables in SQL Server

Definition from Microsoft: SQL Temporal Tables, also known as "system-versioned tables" are a database feature that brings built-in support for providing information about data stored in the table at any point in time rather than only the data that is correct at the current moment in time. Temporal is a database feature that was introduced in ANSI SQL 2011.

What is it used for?
1). Auditing Data History- identifying who did what, when, and row-level restoration to the state of the temporal table data any point in time. (Time travel)

2). Monitor Data Deltas (ie. maintain company sales discounting policy by ensuring no product has had a price change that exceeds some company threshold (200% increase or decrease, etc.))

3). Slowly Changing Dimension management (eliminate the need for complicated ETL logic for tracking SCD- SQL Temporal Tables do this audit tracking automatically)




In Practice
Once upon a time, I worked on a project that had a "unique" (kind euphemism) legacy database schema design. This company had a custom "system" for tracking changes in data. There was no referential integrity so everything relied on application code not having any data access code bugs, which as we all know is not a particularly reliable setup.

Temporal Tables would have done the trick without all the complexity of having to manage a proprietary data versioning system that sometimes worked, oftentimes caused headaches and mass confusion over the root cause of production issues related to versioning of certain data entities.

Temporal Tables have the current or base table, as well as a history table that logs changes over time

Unless there is a specific reason that SQL Temporal Tables won't work for your scenario* you should take advantage of existing functionality like this so that you can focus on the unique functionality of your app and not waste precious time reinventing wheels.

Creating a temporal table is relatively straightforward. In SSMS (connected to an instance of SQL Server 2016 or later), select the option as seen below.

Menu option in SSMS


Here is the T-SQL for basic Temporal Table DDL:

 USE [KpItSimpl]  
 GO  
 --Create system-versioned temporal table. It must have primary key and two datetime2 columns that are part of SYSTEM_TIME period definition  
 CREATE TABLE dbo.SimplTemporalTable  
 (  
      --Table PK and columns definition  
   ProductID int NOT NULL PRIMARY KEY CLUSTERED,  
   ProductDescription varchar(max) NULL,  
      ProductPrice decimal,  
   --Period columns and PERIOD FOR SYSTEM_TIME definition  
   ValidFrom datetime2 (2) GENERATED ALWAYS AS ROW START,   
   ValidTo datetime2 (2) GENERATED ALWAYS AS ROW END,  
   PERIOD FOR SYSTEM_TIME (ValidFrom, ValidTo)   
 )  
  WITH (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.SimplTemporalTableHistory));   
 GO  
T-SQL DDL for creating a new Temporal Table


For a practical example, I inserted some data into this temporal table, and then committed some UPDATE statements on that data, changing the ProductPrice value of some of the product records. The query below demonstrates how we can look for records (we are likely most interested in the exact change time) that had a price change exceeding +/- 200%:

Example of a query utilizing the Temporal Table feature to see an excessive product price change

Note(!) one limitation that stands out right away- you cannot drop a temporal table like a standard SQL table. You must turn versioning off on the table, drop the table and drop the history table. So instead of selecting "Delete" from the Table menu options or executing "DROP TABLE", you will want to select "SCRIPT AS >> DROP To" option as seen below:

You must script the deletion/drop of a SQL temporal table

As you can see, SQL Temporal Tables can make managing data changes a lot easier. Certainly much easier than a proprietary change versioning system and much faster than restoring the entire database to inspect the state of data at a particular point in time.


*Temporal Table Considerations and Limitations: https://docs.microsoft.com/en-us/sql/relational-databases/tables/temporal-table-considerations-and-limitations?view=sql-server-2017

Reference: https://docs.microsoft.com/en-us/sql/relational-databases/tables/temporal-table-usage-scenarios?view=sql-server-2017

T-SQL STUFF, STRING_SPLIT

In cases where you must extract a series of values from a delimited string, you can utilize SQL Server's built-in string_split() function. You need only specify the delimited string and the delimiting character (commas in this case ",") as the arguments.

When a requirement calls for you to roll up data into a single row, you can use STUFF() (with or without combined with FOR XML, depending on your data). Just specify an inner query as the argument for STUFF().

 --Get ids from a csv of ids  
 SELECT * FROM string_split('100,101,201,301,411,414', ',')  
 
--Push a series of values into one row  
 SELECT playerID,  
 STUFF((select '; ' + teamID  
      from Managers  
      where teamID like('%P%')  
            group by teamID  
      for xml path('')), 1, 1, '') allTeams  
 FROM Managers  

Result of these functions

Where can we find the data?

There are myriad resources to get data from a variety of domains (insurance, population, health, geography, weather, sports, etc.). You can get the data in .csv form or through API calls.

Here are a few to jump-start your data-oriented application:

Financial: https://fred.stlouisfed.org/search?st=csv

Baseball: http://www.baseball-databank.org/

Soccer: https://www.soccer24.com/team/apia-tigers/84SKUunL/

Government: https://www.data.gov/

Public: https://github.com/awesomedata/awesome-public-datasets

GIS: https://gisgeography.com/best-free-gis-data-sources-raster-vector/

Google Maps API: https://developers.google.com/maps/documentation/directions/


Reference: https://www.quora.com/What-is-the-best-financial-data-source-in-CSV-file-format


Invisible characters from pasted data



If you find yourself debugging an ETL task error where the problem at hand is data type mismatch or some other data error that is not visible to you in the source (culprit) data, don't fret.

What has likely happened is invisible formatting characters got into the data.

To remedy things, all you'll need to do is open up Notepad++, paste your data into the text window, and from the View menu dropdown, select "Show Symbol" >> "Show All Characters".

Simply edit the data in Windows Notepad (remove the special characters), save, and try to import again. Your data should be sufficiently scrubbed clean and your ETL task will return to normal.


There is also a shortcut button in Notepad++:


PS: handy tip for Windows users- to remove all formatting from copied data, paste the text into Notepad. All formatting will be removed from the data. Re-select it and go paste where needed.

PureText is another application that handles removing all formatting from text: https://stevemiller.net/puretext/