kpitsimpl: August 2019

Why NoSQL is Never Going to Replace SQL (apple:orange)

Do you need a jet engine to get your results from point A to B or will a modest GM sedan suffice?

I'll take the above bad analogy further and posit that while sedans and cars on the ground require stringent rules and have to navigate much more rigid structures, a jet engine simply powers the jet ahead through constraint-less skies- it's purpose is to power something big- not to be concerned with other machinery of the craft (ie. RDBMS features eschewed by NoSQL solutions).

Do you need massive global data sync scale so that millions can connect and make changes and the results all (appear) real-time? If not, NoSQL is not always the right choice and neglecting to have any kind of schema for stored application data structures can present its own host of challenges in the future if (when) those structures change. But alas, you can use NoSQL for some things (Redis, image/BLOB storage) and an RDBMS for others (more structured records and things you want to restore to a point in time in the event of a server failure).

Relational databases tend to be scaled up; NoSQL solutions are scaled out

The SQL vs NoSQL (structured and transactional vs. semi-structured and "eventually consistent") debate is not a matter of one or the other and that's that. These are complementary technologies and should both be used- wherever you find app requirements that suggest one or the other makes the most sense.

When an application is meant to scale immensely and there is not a lot of data integrity, consistency, transaction or complex data structuring and transformation needs- NoSQL is your best bet and will far outscale even the most robust RDBMS server farm- at least at a much lower cost (at the cost of sacrificing features of an RDBMS which may not be needed).

I have personally worked on several projects that utilize relational and unstructured approaches to reading and persisting application data. If you have ever used an application's config file to change a setting in JSON or XML or a simple line entry- you are seeing a small and very basic NoSQL example of storing app data.

Hadoop and other distributed NoSQL db servers are built for scalability

Using NoSQL in software development can make data structures and objects- passed to and fro from APIs and within the application itself- much more flexible to work with. 1-line to serialize an object to JSON chunk, save it to BLOB storage and forget about it.

When dealing with relational data, you really have to understand the data to write good data access code and the underlying SQL that supports well-defined structuring of complex objects.

Well-defined structuring of the persistence of complex application objects avoids data duplication/corruption, prevents breaking reference constraints and losing any sense of hierarchical data relationships and more generally lets you know very quickly when you have a problem within your data storage structures and the objects that initialize themselves from that data.

NoSQL ditches virtually all relational database data normalization rules in favor of a loosely schema'd unstructured (document, BLOB, KeyStore, etc.) data store that relies solely on keys, values and filtering unstructured metadata to get the same SELECT ... WHERE functionality found in RDBMS. Its iterations usually bear a resemblance to Java and as loosely follows here are common SQL statements and their Java or Java-derived equivalent:

"Apache Hadoop is an open source platform built on two technologies: Linux operating system and Java programming language."

For many application data requirements however, relational data can be overkill and totally unnecessary (ie. Redis to store key/vals vs. designing some elaborate key/val store in a SQL Server table).

Implementing something like Splunk or Kibana to continously index app logs and configuration files can help you dip toes into the Lake of Dark Data

The best distributed NoSQL solutions like Hadoop really shine in their inherit ability to dynamically scale to as many server machines as the operators can make ready to serve as "Hadoop processor nodes on standby".

SQL Server scaling is based more on server augmenting or "scaling up" (adding RAM, faster SSDs, RAID Arrays, etc.) rather than distributing workloads across dynamic nodes. SQL Server AlwaysOn Availability and its Mirroring and Replication feature are for recoverability and data sharing- not dynamic scaling to handle bigger and bigger workloads.

SQL has been around forever. The fundamental concept behind NoSQL (semi-structured or loosely structured data) has been around since long before SQL relational database technology. Both (along with NoSQL-related graph database paradigm) will continue to serve as viable data storage solution alternatives for many more years into the 21st century.

Relational systems like SQL Server, MySQL and Oracle usually handle structured side; NoSQL vendors are after the other 90%

In fact, SQL Server 2019's Polybase extension supports Hadoop Clusters, MongoDB and Terradata T-SQL query integration. A new feature called SQL Server Big Data Clusters helps make distributed NoSQL nodes manageable within SSMS environment.

Mongo, Hadoop and other NoSQL database servers have SQL server integration to support relational data sources.

SQL Server 2019 Polybase integrates Hadoop, MongoDB and many other sources with relational data and T-SQL queries

CAP Theorem: a distributed data system like most all NoSQL solutions can only achieve 2 of the 3 features: "Consistency", "Availability" and "Partition Tolerance"

ACID vs BASE: The relational axiom of "Atomic, Consistent, Isolated, Durable" contrasted against NoSQL's vague promise of "Basically Availability, Soft State, Eventual Consistency" (dirty reads common)

This ol' tried-and-true database server software ain't going away in the foreseeable future

Hundreds of millions of corporate, mid and small business applications are running along just fine in 2019 using various RDBMS platforms (SQL Server, Oracle, DB2, PostgreSQL, MySQL, etc.) for at least one of their data stores.

Many more millions of applications have been using one riff or another of NoSQL (semi-structured data) before, during and after the mythical "Relational Movement" as described by software veteran Robin Bloor:

"The Relational Model of Data Never Dominated Anyway. Estimates vary, but it is generally agreed that somewhere between 70% and 95% of the world’s data is stored only in poorly structured or unstructured formats such as: word processing documents, spreadsheets, HTML files and e-mail. The truth is that Relational database never did really dominate. It was rejected out of hand, year after year, as an effective store for many types of data." -Robin Bloor on insideanalysis.com

Google search trends over the last 5yrs certainly suggest relational SQL is not going anywhere anytime soon...

Considerations when evaluating whether to use NoSQL:

NoSQL is a precise tool for precise data needs; if relational SQL is too much for your group, NoSQL will likely be too steep a learning curve

Data Integrity- when billions of NoSQL records are affected by a small change in schema that is not able to propagate correctly or runs into constraint issues or hierachy and relations are impossible to infer... maybe relational SQL would be a better approach

NoSQL touts loose schema structure is a benefit but this simply means schema and data structure enforcement has been shifted from the database layer to the application layer. Data cannot "self-manage".

Some apps are prime candidates for NoSQL's document-centric and resource-centric distributed storage architecture

Also, there is this to consider:

(re: the longevity and simple-yet-powerful abstractions of SQL)

If NoSQL solutions are eventually able to achieve the same transactional consistency and complex schema structures that some applications require and then ultimately subsume RDBMS completely- it'll still require a lot of SQL gurus to convert and integrate all the legacy relational database apps for a long, long time to come...

Bring on MongoDB, CouchDB, Dynamo, MapReduce, HBase, BigTable, Cassandra.

As data professionals we will have an increasingly complex array of tools to understand; what we do with them will drive the future

Long live SQL Server 2030. 😉

References:

https://blog.timescale.com/why-sql-beating-nosql-what-this-means-for-future-of-data-time-series-database-348b777b847a/

https://www.wired.com/2012/12/couchdb/

https://pdfs.semanticscholar.org/a6f0/1c9103d3bafb8ce92641c9f2a4deaccd12f9.pdf

https://www.memsql.com/blog/why-nosql-databases-wrong-tool-for-modern-application/

https://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:1589423200346982646

https://news.ycombinator.com/item?id=479165

https://insideanalysis.com/is-the-relational-database-doomed/

NLog for Flexible Application Logging in .NET

As much as I have enjoyed working with log4net in the psat NLog works much more seamlessly for .NET apps (it has good, solid abstractions)

From Nuget Package Manager, you can find and reference NLog and the accompanying NLog.Config which simplifies setup. To configure NLog logging, simply point the required properties (you need at least a logFile variable, a target and a rule to get started) to your desired values in the config file that NLog.Config creates in your project root (NLog.config):

Configuration:

 <?xml version="1.0" encoding="utf-8" ?>  
 <nlog xmlns="http://www.nlog-project.org/schemas/NLog.xsd"  
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"  
    xsi:schemaLocation="http://www.nlog-project.org/schemas/NLog.xsd NLog.xsd"  
    autoReload="true"  
    throwExceptions="false"  
    internalLogLevel="Off" internalLogFile="c:\temp\nlog-internal.log">  
  <variable name="logFilePath" value="C:\NLog\IDG.log" />  
  <targets>  
   <target name="logfile"  
     xsi:type="File"  
     fileName="${logFilePath}"  
     layout="${longdate}  LEVEL=${level:upperCase=true}: ${message}"  
     keepFileOpen="true" />  
  </targets>  
  <rules>  
   <logger name="*" minlevel="Info" writeTo="logfile" />  
   <logger name="*" minlevel="Warn" writeTo="logFile"/>  
  </rules>  
 </nlog>

This example uses a log file target in C:\NLog directory; you can utilize a wide variety of logging targets to broadcast app errors

Source Code:

 using Microsoft.VisualStudio.TestTools.UnitTesting;  
 using NLog;  
 namespace ExtRSTests  
 {  
   [TestClass]  
   public class NLogTests  
   {  
     private static Logger logger = LogManager.GetCurrentClassLogger();  
     [TestMethod]  
     public void TestLoggerToFile()  
     {  
       logger.Warn("Something in the app happened that may indicate trouble ahead....");  
       logger.Error("Uh-oh. Something broke.");  
     }  
   }  
 }

With the logging levels, log formatting (timestamps) and abundant integration options, NLog is a complete logging solution

With NLogger you can implement logging for an array of targets including file, email, database and 3rd party integrations (ie. send message to Slack channel if logger generates any "Error" or "Fatal" level log message).

Reference: https://www.infoworld.com/article/3144535/how-to-work-with-nlog-in-net.html

.NET XML Serialization with System.Xml.Serialization

From SOAP, to app configuration, model templating, EDI/ETL, and beyond, XML is a mainstay in software and will continue to be (see- HTML)

Source Code:

 using System;  
 using Microsoft.VisualStudio.TestTools.UnitTesting;  
 using System.Xml.Serialization;  
 using System.IO;  
 using System.Xml.Linq;  
 using System.Xml;  
 namespace DemoTests  
 {  
   [TestClass]  
   public class TextXmlSerialization  
   {  
     public const string xml =  
       "<XMLEntityCollection>" +  
         "<XMLEntities>" +  
           "<XMLEntity>" +   
             "<ID>1</ID>" +   
             "<Name>TestName1</Name>" +   
           "</XMLEntity>" +  
           "<XMLEntity>" +  
             "<ID>2</ID>" +  
             "<Name>TestName2</Name>" +  
           "</XMLEntity>" +  
         "</XMLEntities>" +  
       "</XMLEntityCollection>";  
     [TestMethod]  
     public void TestDeserialize()  
     {  
       XmlSerializer serializer = new XmlSerializer(typeof(XMLEntityCollection));  
       using (StringReader reader = new StringReader(xml))  
       {  
         XMLEntityCollection ents = (XMLEntityCollection)serializer.Deserialize(reader);  
         TestSerialize(ents);  
       }  
     }  
     [TestMethod]  
     public void TestSerialize(XMLEntityCollection ents)  
     {  
       XmlSerializer xmlSerializer = new XmlSerializer(ents.GetType());  
       using (StringWriter stringWriter = new StringWriter())  
       {  
         xmlSerializer.Serialize(stringWriter, ents);  
         Assert.IsNotNull(stringWriter);  
       }  
     }  
   }  
   [Serializable()]  
   public class XMLEntity  
   {  
     [XmlElement("ID")]  
     public string Id { get; set; }  
     [XmlElement("Name")]  
     public string Name { get; set; }  
   }  
   [Serializable()]  
   [XmlRoot("XMLEntityCollection")]  
   public class XMLEntityCollection  
   {  
     [XmlArray("XMLEntities")]  
     [XmlArrayItem("XMLEntity", typeof(XMLEntity))]  
     public XMLEntity[] Ents { get; set; }  
   }  
 }

.NET XML Deserialization with System.Xml.Serialization:

Instantiated (ents) object containing XML from the deserialized XML string

.NET XML Serialization with System.Xml.Serialization:

XML genertaed by the stringWriter when serializing ents object into XML using the NET XMLSerializer type

Reference: https://stackoverflow.com/questions/364253/how-to-deserialize-xml-document

Newtonsoft JSON .NET Nuget Nugget

Most API payloads are in XML or JSON; it is best to know both of these data structures, and how to serialize/deserialize them

The JSON parsing utilities found in Newtonsoft.Json are very. very useful and should be common knowledge for any .NET developer who works with API data or anything producing or derived from JSON (JavaScript Object Notation).

In general, to use Newtonsoft.Json you simply need to create a .NET class hierarchy that mimics the structure and hierarchy of the target JSON. Once that is setup, serializing in-memory objects to JSON and deserializing the JSON back to in-memory objects is a breeze.

You achieve this by normal class hierarchy and making List<> of child objects, array properties, etc. Newtonsoft's 'JsonProperty' class property decorator maps JSON properties and the builtin serialization and deserialization methods facilitate working between JSON strings and the in-memory objects they represent.

The C# source code below demonstrates serialization from a SQL Server 2017 SSRS API v2 JSON response and then serializing that object back into JSON.

Source Code:

 using System;  
 using Microsoft.VisualStudio.TestTools.UnitTesting;  
 using Newtonsoft.Json;  
 using System.Collections.Generic;  
 using System.Net.Http;  
 using System.Threading.Tasks;  
 namespace DemoTests  
 {  
   [TestClass]  
   public class DemoTestJSON  
   {  
     [TestMethod]  
     public async Task TestDeserializeJSON()  
     {  
       HttpClient client = new HttpClient(new HttpClientHandler() { UseDefaultCredentials = true });  
       client.BaseAddress = new Uri("http://localhost/reports/api/v2.0/reports");  
       var response = await client.GetAsync(client.BaseAddress);  
       var deserial = JsonConvert.DeserializeObject<APIGenericItemsResponse>(await response.Content.ReadAsStringAsync());       
       TestSerializeJSON(deserial);  
       Assert.IsNotNull(deserial);  
     }  
     [TestMethod]  
     public void TestSerializeJSON(APIGenericItemsResponse genericObject)  
     {      
       string serial = JsonConvert.SerializeObject(genericObject);  
       Assert.IsNotNull(null);  
     }  
   }  
   public class APIGenericItemsResponse  
   {  
     [JsonProperty("@odata.context")]  
     public string Context { get; set; }  
     [JsonProperty("value")]  
     public List<GenericItem> GenericItem { get; set; }  
   }  
   public class GenericItem  
   {  
     [JsonProperty("Id")]  
     public string Id { get; set; }  
     [JsonProperty("Name")]  
     public string Name { get; set; }  
     [JsonProperty("Path")]  
     public string Path { get; set; }  
   }  
 }

SSRS API v2 /Reports JSON Response:

JSON Deserialization with Newtonsoft.Json:

The deserial variable holds an in-memory .NET object of type APIGenericResponse, derived (deserialized) from the SSRS API JSON response

JSON Serialization with Newtonsoft.Json:

The serial variable is simply the serialization APIGenericItemsResponse object serialized into a JSON string

Reference: https://www.newtonsoft.com/json/help/html/Introduction.htm

base in C#

The base keyword in C# allows a subclass to access base (superclass) members.

All credit to Suresh Dasari of Tutlane (reference below) on explaining this so effectively in just a few steps of code.

What is shown here is the Details subclass overriding the Users base class' "GetInfo()" method and including the base behavior (Console.WriteLine("Name: {0}", name); ... Console.WriteLine("Location: {0}", location); - along with- some new behavior (Console.WriteLine("Age: {0}", base.age);). In this way members can be shared between subtypes and the type they inherit from- in constructors as well as elsewhere in the subclass.

 using System;  
 namespace Tutlane  
 {  
   // Base Class  
   public class Users  
   {  
     public string name = "Suresh Dasari";  
     public string location = "Hyderabad";  
     public int age = 32;  
     public virtual void GetInfo()  
     {  
       Console.WriteLine("Name: {0}", name);  
       Console.WriteLine("Location: {0}", location);  
     }  
   }  
   // Derived Class  
   public class Details : Users  
   {  
     public override void GetInfo()  
     {  
       base.GetInfo();  
       Console.WriteLine("Age: {0}", base.age);  
     }  
   }  
   class Program  
   {  
     static void Main(string[] args)  
     {  
       Details d = new Details();  
       d.GetInfo();  
       Console.WriteLine("\nPress Enter Key to Exit..");  
       Console.ReadLine();  
     }  
   }  
 }

Result

Reference: https://www.tutlane.com/tutorial/csharp/csharp-base-keyword

IP Addressing and Subnets, Subnet Masking

Knowledge of network configuration and administration is an (incredibly- still) underrated, underappreciated and immensely powerful tool for any IT professional to possess.

All subnet masking schemes, the mask bits in binary, available number of hosts. A "/24" is common for small LAN subnets.

One area of computer networking that should be more well-understood by software developers is the configuration of subnetworks via subnet masks. A subnet mask (ie. 255.255.255.0) is simply a way of re-purposing an IP Address by segmenting it into network and host portions.

An IPv4 address consists of 4 bytes (32 bits) of data. Each of those bytes contain 8 bits known as "octets". In a 255.255.255.0 subnet mask- all but the last octet is being used for the network ID portion of the IP address and so are ignored.

At this point we could get into the logical ANDing of IP address bits and subnet mask bits but just be aware that the masking bits allow for the network portion of the IP address to be separated from the host portion- that is they key purpose of subnetting and the subnet mask.

The breakdown of a Class B IPv4 address

The subnet mask is designed to denote the number of bits in an IP address (ie. 10.9.1.14) that form the network portion (10.9.1) vs. the host portion (.14).

In this way, IPs can be used in ways they were not originally designed- but that are altogether needed for proper organization of something that has grown as seemingly unwieldy as IP networks of "the Internet" (publically accessible networks of subnetworks). With a little reference knowledge you can understand even the trickiest of subnet configurations.

But wait- there is (lots) more...

The example above illustrates only a very basic subnetting situation.

Where things get tricky is when a subnet mask ends not at the end of an entire octet, but just before the start of the host portion of the IP- in the same octet (ie. 255.255.128.0). In more complex network configuration scenarios it is helpful to refer to a subnet configuration reference sheet like the following to identify the subnet and/or subnet mask information you are looking for:

Describing the nature of a /29 subnet solely from knowing the IP address (10.1.1.37) of one of its hosts and that it is a /29 subnet.

Below are the 7 common pieces of information that you will need to know when analyzing subnet configurations:

Network ID: First available IP address in the subnet.

Broadcast IP: Last available address in the subnet.

First Host IP: Network ID + 1

Last Host IP: Broadcast IP - 1

Next Network: Broadcast + 1

# of IP Addresses: Number of IP addresses in the subnet range (subtract 2 to find the number of "usable" device IP addresses) - refer to the Subnet Mask Reference Sheet

This enlightening example shows how MCI uses 11 bits of mask, Automation Research Systems 22 bits, ARS 24 bits, freesoft.org 32 bits- all on the same IP address; you can see the subnet hierarchy as MCI controls the entire 208.128.0.0/11 network

Online CIDR Calculator showing MCI subnet breakdown which includes the other 3 subnets shown

IP Points to remember:

IP octets (base 10 representation) are 0-inclusive so only ever max of .255 in any given octet.

Subnet Mask is a 32-bit number that indicates how many bits of an IP address are used to indicate the network portion vs. host portion and is a way to subdivide networks for organization, security and manageability.

The first two available host addresses are network (generally .0), then router (generally .1) and the last available host address (generally .255) is used as the subnet's broadcast address- note these example octets are small LAN defaults/generalities and likely will not apply to a complex subnet.

Class A (0-127) uses 8 bits for the network portion of the IP address, leaving 24 bits for host IDs

Class B (128-191) uses 16 bits for the network portion of the IP address, leaving 16 bits for host IDs

Class C (192-223) uses 24 bits for the network portion of the IP address, leaving 8 bits for host IDs

CIDR is the acronym for Classless Inter-domain Routing. It (/26, /24, etc.) is just the number of IP address bits used by the subnet mask (255.255.255.0 = /24 or 24 bits of mask, .255.255.255.192 = /26 or 26 bits of mask).

When sorting through IP ranges to determine which range a particular subnetwork group is in, use these time saving tricks recommended by PracticalNetworking:

(1) multiply group size by 10 as a (*10) multiple of the group size will be reached

(2) if multiplying group size by 10 goes beyond the IP address for which you are trying to find the subnetwork range, remember that "every group size will land on 128 eventually"- so you can use that for a starting basis as well.

(3) every group size lands on the subnet value of the selected subnet and every subnet to the left of it (ie. for a /27 subnet or ".224" subnet mask- .224, .192 and .128 will all match the start of a group)

References:

https://www.youtube.com/watch?v=s_Ntt6eTn94&ab_channel=PowerCertAnimatedVideos

https://www.youtube.com/watch?v=BWZ-MHIhqjM

http://www.subnet-calculator.com/

https://www.pcwdld.com/subnet-mask-cheat-sheet-guide