Why NoSQL is Never Going to Replace SQL (apple:orange)

Do you need a jet engine to get your results from point A to B or will a modest GM sedan suffice?

I'll take the above bad analogy further and posit that while sedans and cars on the ground require stringent rules and have to navigate much more rigid structures, a jet engine simply powers the jet ahead through constraint-less skies- it's purpose is to power something big- not to be concerned with other machinery of the craft (ie. RDBMS features eschewed by NoSQL solutions).

Do you need massive global data sync scale so that millions can connect and make changes and the results all (appear) real-time? If not, NoSQL is not always the right choice and neglecting to have any kind of schema for stored application data structures can present its own host of challenges in the future if (when) those structures change. But alas, you can use NoSQL for some things (Redis, image/BLOB storage) and an RDBMS for others (more structured records and things you want to restore to a point in time in the event of a server failure).

Relational databases tend to be scaled up; NoSQL solutions are scaled out

The SQL vs NoSQL (structured and transactional vs. semi-structured and "eventually consistent") debate is not a matter of one or the other and that's that. These are complementary technologies and should both be used- wherever you find app requirements that suggest one or the other makes the most sense.

When an application is meant to scale immensely and there is not a lot of data integrity, consistency, transaction or complex data structuring and transformation needs- NoSQL is your best bet and will far outscale even the most robust RDBMS server farm- at least at a much lower cost (at the cost of sacrificing features of an RDBMS which may not be needed).

I have personally worked on several projects that utilize relational and unstructured approaches to reading and persisting application data. If you have ever used an application's config file to change a setting in JSON or XML or a simple line entry- you are seeing a small and very basic NoSQL example of storing app data.

Hadoop and other distributed NoSQL db servers are built for scalability


Using NoSQL in software development can make data structures and objects- passed to and fro from APIs and within the application itself- much more flexible to work with. 1-line to serialize an object to JSON chunk, save it to BLOB storage and forget about it.

When dealing with relational data, you really have to understand the data to write good data access code and the underlying SQL that supports well-defined structuring of complex objects.

Well-defined structuring of the persistence of complex application objects avoids data duplication/corruption, prevents breaking reference constraints and losing any sense of hierarchical data relationships and more generally lets you know very quickly when you have a problem within your data storage structures and the objects that initialize themselves from that data.

NoSQL ditches virtually all relational database data normalization rules in favor of a loosely schema'd unstructured (document, BLOB, KeyStore, etc.) data store that relies solely on keys, values and filtering unstructured metadata to get the same SELECT ... WHERE functionality found in RDBMS. Its iterations usually bear a resemblance to Java and as loosely follows here are common SQL statements and their Java or Java-derived equivalent:


"Apache Hadoop is an open source platform built on two technologies: Linux operating system and Java programming language."


For many application data requirements however, relational data can be overkill and totally unnecessary (ie. Redis to store key/vals vs. designing some elaborate key/val store in a SQL Server table).


Implementing something like Splunk or Kibana to continously index app logs and configuration files can help you dip toes into the Lake of Dark Data


The best distributed NoSQL solutions like Hadoop really shine in their inherit ability to dynamically scale to as many server machines as the operators can make ready to serve as "Hadoop processor nodes on standby".

SQL Server scaling is based more on server augmenting or "scaling up" (adding RAM, faster SSDs, RAID Arrays, etc.) rather than distributing workloads across dynamic nodes. SQL Server AlwaysOn Availability and its Mirroring and Replication feature are for recoverability and data sharing- not dynamic scaling to handle bigger and bigger workloads.

SQL has been around forever. The fundamental concept behind NoSQL (semi-structured or loosely structured data) has been around since long before SQL relational database technology. Both (along with NoSQL-related graph database paradigm) will continue to serve as viable data storage solution alternatives for many more years into the 21st century.


Relational systems like SQL Server, MySQL and Oracle usually handle structured side; NoSQL vendors are after the other 90%


In fact, SQL Server 2019's Polybase extension supports Hadoop Clusters, MongoDB and Terradata T-SQL query integration. A new feature called SQL Server Big Data Clusters helps make distributed NoSQL nodes manageable within SSMS environment.

Mongo, Hadoop and other NoSQL database servers have SQL server integration to support relational data sources.

SQL Server 2019 Polybase integrates Hadoop, MongoDB and many other sources with relational data and T-SQL queries


CAP Theorem: a distributed data system like most all NoSQL solutions can only achieve 2 of the 3 features: "Consistency", "Availability" and "Partition Tolerance"

ACID vs BASE:  The relational axiom of "Atomic, Consistent, Isolated, Durable" contrasted against NoSQL's vague promise of "Basically Availability, Soft State, Eventual Consistency" (dirty reads common)

This ol' tried-and-true database server software ain't going away in the foreseeable future


Hundreds of millions of corporate, mid and small business applications are running along just fine in 2019 using various RDBMS platforms (SQL Server, Oracle, DB2, PostgreSQL, MySQL, etc.) for at least one of their data stores.

Many more millions of applications have been using one riff or another of NoSQL (semi-structured data) before, during and after the mythical "Relational Movement" as described by software veteran Robin Bloor:

"The Relational Model of Data Never Dominated Anyway. Estimates vary, but it is generally agreed that somewhere between 70% and 95% of the world’s data is stored only in poorly structured or unstructured formats such as: word processing documents, spreadsheets, HTML files and e-mail. The truth is that Relational database never did really dominate. It was rejected out of hand, year after year, as an effective store for many types of data." -Robin Bloor on insideanalysis.com


Google search trends over the last 5yrs certainly suggest relational SQL is not going anywhere anytime soon...


Considerations when evaluating whether to use NoSQL:
  • NoSQL is a precise tool for precise data needs; if relational SQL is too much for your group, NoSQL will likely be too steep a learning curve
  • Data Integrity- when billions of NoSQL records are affected by a small change in schema that is not able to propagate correctly or runs into constraint issues or hierachy and relations are impossible to infer... maybe relational SQL would be a better approach
  • NoSQL touts loose schema structure is a benefit but this simply means schema and data structure enforcement has been shifted from the database layer to the application layer. Data cannot "self-manage".
  • Some apps are prime candidates for NoSQL's document-centric and resource-centric distributed storage architecture


Also, there is this to consider:


(re: the longevity and simple-yet-powerful abstractions of SQL)


If NoSQL solutions are eventually able to achieve the same transactional consistency and complex schema structures that some applications require and then ultimately subsume RDBMS completely- it'll still require a lot of SQL gurus to convert and integrate all the legacy relational database apps for a long, long time to come...

Bring on MongoDB, CouchDB, Dynamo, MapReduce, HBase, BigTable, Cassandra.


As data professionals we will have an increasingly complex array of tools to understand; what we do with them will drive the future



Long live SQL Server 2030. 😉




References:

https://blog.timescale.com/why-sql-beating-nosql-what-this-means-for-future-of-data-time-series-database-348b777b847a/

https://www.wired.com/2012/12/couchdb/

https://pdfs.semanticscholar.org/a6f0/1c9103d3bafb8ce92641c9f2a4deaccd12f9.pdf

https://www.memsql.com/blog/why-nosql-databases-wrong-tool-for-modern-application/

https://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:1589423200346982646

https://news.ycombinator.com/item?id=479165

https://insideanalysis.com/is-the-relational-database-doomed/

NLog for Flexible Application Logging in .NET

As much as I have enjoyed working with log4net in the psat NLog works much more seamlessly for .NET apps (it has good, solid abstractions)

From Nuget Package Manager, you can find and reference NLog and the accompanying NLog.Config which simplifies setup. To configure NLog logging, simply point the required properties (you need at least a logFile variable, a target and a rule to get started) to your desired values in the config file that NLog.Config creates in your project root  (NLog.config):

Configuration:

 <?xml version="1.0" encoding="utf-8" ?>  
 <nlog xmlns="http://www.nlog-project.org/schemas/NLog.xsd"  
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"  
    xsi:schemaLocation="http://www.nlog-project.org/schemas/NLog.xsd NLog.xsd"  
    autoReload="true"  
    throwExceptions="false"  
    internalLogLevel="Off" internalLogFile="c:\temp\nlog-internal.log">  
  <variable name="logFilePath" value="C:\NLog\IDG.log" />  
  <targets>  
   <target name="logfile"  
     xsi:type="File"  
     fileName="${logFilePath}"  
     layout="${longdate}  LEVEL=${level:upperCase=true}: ${message}"  
     keepFileOpen="true" />  
  </targets>  
  <rules>  
   <logger name="*" minlevel="Info" writeTo="logfile" />  
   <logger name="*" minlevel="Warn" writeTo="logFile"/>  
  </rules>  
 </nlog>  
This example uses a log file target in C:\NLog directory; you can utilize a wide variety of logging targets to broadcast app errors

Source Code:

 using Microsoft.VisualStudio.TestTools.UnitTesting;  
 using NLog;  
 namespace ExtRSTests  
 {  
   [TestClass]  
   public class NLogTests  
   {  
     private static Logger logger = LogManager.GetCurrentClassLogger();  
     [TestMethod]  
     public void TestLoggerToFile()  
     {  
       logger.Warn("Something in the app happened that may indicate trouble ahead....");  
       logger.Error("Uh-oh. Something broke.");  
     }  
   }  
 }  
With the logging levels, log formatting (timestamps) and abundant integration options, NLog is a complete logging solution

With NLogger you can implement logging for an array of targets including file, email, database and 3rd party integrations (ie. send message to Slack channel if logger generates any "Error" or "Fatal" level log message).


Reference: https://www.infoworld.com/article/3144535/how-to-work-with-nlog-in-net.html

.NET XML Serialization with System.Xml.Serialization

From SOAP, to app configuration, model templating, EDI/ETL, and beyond, XML is a mainstay in software and will continue to be (see- HTML)


Source Code:
 using System;  
 using Microsoft.VisualStudio.TestTools.UnitTesting;  
 using System.Xml.Serialization;  
 using System.IO;  
 using System.Xml.Linq;  
 using System.Xml;  
 namespace DemoTests  
 {  
   [TestClass]  
   public class TextXmlSerialization  
   {  
     public const string xml =  
       "<XMLEntityCollection>" +  
         "<XMLEntities>" +  
           "<XMLEntity>" +   
             "<ID>1</ID>" +   
             "<Name>TestName1</Name>" +   
           "</XMLEntity>" +  
           "<XMLEntity>" +  
             "<ID>2</ID>" +  
             "<Name>TestName2</Name>" +  
           "</XMLEntity>" +  
         "</XMLEntities>" +  
       "</XMLEntityCollection>";  
     [TestMethod]  
     public void TestDeserialize()  
     {  
       XmlSerializer serializer = new XmlSerializer(typeof(XMLEntityCollection));  
       using (StringReader reader = new StringReader(xml))  
       {  
         XMLEntityCollection ents = (XMLEntityCollection)serializer.Deserialize(reader);  
         TestSerialize(ents);  
       }  
     }  
     [TestMethod]  
     public void TestSerialize(XMLEntityCollection ents)  
     {  
       XmlSerializer xmlSerializer = new XmlSerializer(ents.GetType());  
       using (StringWriter stringWriter = new StringWriter())  
       {  
         xmlSerializer.Serialize(stringWriter, ents);  
         Assert.IsNotNull(stringWriter);  
       }  
     }  
   }  
   [Serializable()]  
   public class XMLEntity  
   {  
     [XmlElement("ID")]  
     public string Id { get; set; }  
     [XmlElement("Name")]  
     public string Name { get; set; }  
   }  
   [Serializable()]  
   [XmlRoot("XMLEntityCollection")]  
   public class XMLEntityCollection  
   {  
     [XmlArray("XMLEntities")]  
     [XmlArrayItem("XMLEntity", typeof(XMLEntity))]  
     public XMLEntity[] Ents { get; set; }  
   }  
 }  



.NET XML Deserialization with System.Xml.Serialization: 


Instantiated (ents) object containing XML from the deserialized XML string



.NET XML Serialization with System.Xml.Serialization:


XML genertaed  by the stringWriter when serializing ents object into XML using the NET XMLSerializer type



Reference: https://stackoverflow.com/questions/364253/how-to-deserialize-xml-document

Newtonsoft JSON .NET Nuget Nugget

Most API payloads are in XML or JSON; it is best to know both of these data structures, and how to serialize/deserialize them

The JSON parsing utilities found in Newtonsoft.Json are very. very useful and should be common knowledge for any .NET developer who works with API data or anything producing or derived from JSON (JavaScript Object Notation).

In general, to use Newtonsoft.Json you simply need to create a .NET class hierarchy that mimics the structure and hierarchy of the target JSON. Once that is setup, serializing in-memory objects to JSON and deserializing the JSON back to in-memory objects is a breeze.

You achieve this by normal class hierarchy and making List<> of child objects, array properties, etc. Newtonsoft's 'JsonProperty' class property decorator maps JSON properties and the builtin serialization and deserialization methods facilitate working between JSON strings and the in-memory objects they represent.

The C# source code below demonstrates serialization from a SQL Server 2017 SSRS API v2 JSON response and then serializing that object back into JSON.

Source Code:

 using System;  
 using Microsoft.VisualStudio.TestTools.UnitTesting;  
 using Newtonsoft.Json;  
 using System.Collections.Generic;  
 using System.Net.Http;  
 using System.Threading.Tasks;  
 namespace DemoTests  
 {  
   [TestClass]  
   public class DemoTestJSON  
   {  
     [TestMethod]  
     public async Task TestDeserializeJSON()  
     {  
       HttpClient client = new HttpClient(new HttpClientHandler() { UseDefaultCredentials = true });  
       client.BaseAddress = new Uri("http://localhost/reports/api/v2.0/reports");  
       var response = await client.GetAsync(client.BaseAddress);  
       var deserial = JsonConvert.DeserializeObject<APIGenericItemsResponse>(await response.Content.ReadAsStringAsync());       
       TestSerializeJSON(deserial);  
       Assert.IsNotNull(deserial);  
     }  
     [TestMethod]  
     public void TestSerializeJSON(APIGenericItemsResponse genericObject)  
     {      
       string serial = JsonConvert.SerializeObject(genericObject);  
       Assert.IsNotNull(null);  
     }  
   }  
   public class APIGenericItemsResponse  
   {  
     [JsonProperty("@odata.context")]  
     public string Context { get; set; }  
     [JsonProperty("value")]  
     public List<GenericItem> GenericItem { get; set; }  
   }  
   public class GenericItem  
   {  
     [JsonProperty("Id")]  
     public string Id { get; set; }  
     [JsonProperty("Name")]  
     public string Name { get; set; }  
     [JsonProperty("Path")]  
     public string Path { get; set; }  
   }  
 }  


SSRS API v2 /Reports JSON Response:




JSON Deserialization with Newtonsoft.Json:
The deserial variable holds an in-memory .NET object of type APIGenericResponse, derived (deserialized) from the SSRS API JSON response




JSON Serialization with Newtonsoft.Json:

The serial variable is simply the serialization APIGenericItemsResponse object serialized into a JSON string


Reference: https://www.newtonsoft.com/json/help/html/Introduction.htm

base in C#

The base keyword in C# allows a subclass to access base (superclass) members.

All credit to Suresh Dasari of Tutlane (reference below) on explaining this so effectively in just a few steps of code.

What is shown here is the Details subclass overriding the Users base class' "GetInfo()" method and including the base behavior (Console.WriteLine("Name: {0}", name); ... Console.WriteLine("Location: {0}", location); - along with- some new behavior (Console.WriteLine("Age: {0}", base.age);). In this way members can be shared between subtypes and the type they inherit from- in constructors as well as elsewhere in the subclass.

 using System;  
 namespace Tutlane  
 {  
   // Base Class  
   public class Users  
   {  
     public string name = "Suresh Dasari";  
     public string location = "Hyderabad";  
     public int age = 32;  
     public virtual void GetInfo()  
     {  
       Console.WriteLine("Name: {0}", name);  
       Console.WriteLine("Location: {0}", location);  
     }  
   }  
   // Derived Class  
   public class Details : Users  
   {  
     public override void GetInfo()  
     {  
       base.GetInfo();  
       Console.WriteLine("Age: {0}", base.age);  
     }  
   }  
   class Program  
   {  
     static void Main(string[] args)  
     {  
       Details d = new Details();  
       d.GetInfo();  
       Console.WriteLine("\nPress Enter Key to Exit..");  
       Console.ReadLine();  
     }  
   }  
 }  


Result

Reference: https://www.tutlane.com/tutorial/csharp/csharp-base-keyword

IP Addressing and Subnets, Subnet Masking

Knowledge of network configuration and administration is an (incredibly- still) underrated, underappreciated and immensely powerful tool for any IT professional to possess.

All subnet masking schemes, the mask bits in binary, available number of hosts. A "/24" is common for small LAN subnets.


One area of computer networking that should be more well-understood by software developers is the configuration of subnetworks via subnet masks. A subnet mask (ie. 255.255.255.0) is simply a way of re-purposing an IP Address by segmenting it into network and host portions.

An IPv4 address consists of 4 bytes (32 bits) of data. Each of those bytes contain 8 bits known as "octets". In a 255.255.255.0 subnet mask- all but the last octet is being used for the network ID portion of the IP address and so are ignored.

At this point we could get into the logical ANDing of IP address bits and subnet mask bits but just be aware that the masking bits allow for the network portion of the IP address to be separated from the host portion- that is they key purpose of subnetting and the subnet mask.

The breakdown of a Class B IPv4 address

The subnet mask is designed to denote the number of bits in an IP address (ie. 10.9.1.14) that form the network portion (10.9.1) vs. the host portion (.14).

In this way, IPs can be used in ways they were not originally designed- but that are altogether needed for proper organization of something that has grown as seemingly unwieldy as IP networks of "the Internet" (publically accessible networks of subnetworks). With a little reference knowledge you can understand even the trickiest of subnet configurations.

But wait- there is (lots) more...

The example above illustrates only a very basic subnetting situation.

Where things get tricky is when a subnet mask ends not at the end of an entire octet, but just before the start of the host portion of the IP- in the same octet (ie. 255.255.128.0). In more complex network configuration scenarios it is helpful to refer to a subnet configuration reference sheet like the following to identify the subnet and/or subnet mask information you are looking for:


Describing the nature of a /29 subnet solely from knowing the IP address (10.1.1.37) of one of its hosts and that it is a /29 subnet.


Below are the 7 common pieces of information that you will need to know when analyzing subnet configurations:

Network ID: First available IP address in the subnet.

Broadcast IP: Last available address in the subnet.

First Host IP: Network ID + 1

Last Host IP: Broadcast IP - 1

Next Network: Broadcast  + 1

# of IP Addresses: Number of IP addresses in the subnet range (subtract 2 to find the number of "usable" device IP addresses) - refer to the Subnet Mask Reference Sheet


This enlightening example shows how MCI uses 11 bits of mask, Automation Research Systems 22 bits, ARS 24 bits, freesoft.org 32 bits- all on the same IP address; you can see the subnet hierarchy as MCI controls the entire 208.128.0.0/11 network


Online CIDR Calculator showing MCI subnet breakdown which includes the other 3 subnets shown

IP Points to remember:

  • IP octets (base 10 representation) are 0-inclusive so only ever max of .255 in any given octet.
  • Subnet Mask is a 32-bit number that indicates how many bits of an IP address are used to indicate the network portion vs. host portion and is a way to subdivide networks for organization, security and manageability.
  • The first two available host addresses are network (generally .0), then router (generally .1) and the last available host address (generally .255) is used as the subnet's broadcast address- note these example octets are small LAN defaults/generalities and likely will not apply to a complex subnet.
  • Class A (0-127) uses 8 bits for the network portion of the IP address, leaving 24 bits for host IDs
  • Class B (128-191) uses 16 bits for the network portion of the IP address, leaving 16 bits for host IDs
  • Class C (192-223) uses 24 bits for the network portion of the IP address, leaving 8 bits for host IDs
  • CIDR is the acronym for Classless Inter-domain Routing. It (/26, /24, etc.) is just the number of IP address bits used by the subnet mask (255.255.255.0 = /24 or 24 bits of mask, .255.255.255.192 = /26 or 26 bits of mask).
  • When sorting through IP ranges to determine which range a particular subnetwork group is in, use these time saving tricks recommended by PracticalNetworking:
    • (1) multiply group size by 10 as a (*10) multiple of the group size will be reached
    • (2) if multiplying group size by 10 goes beyond the IP address for which you are trying to find the subnetwork range, remember that "every group size will land on 128 eventually"- so you can use that for a starting basis as well.
    • (3) every group size lands on the subnet value of the selected subnet and every subnet to the left of it (ie. for a /27 subnet or ".224" subnet mask- .224, .192 and .128 will all match the start of a group)

References:

https://www.youtube.com/watch?v=s_Ntt6eTn94&ab_channel=PowerCertAnimatedVideos

https://www.youtube.com/watch?v=BWZ-MHIhqjM

http://www.subnet-calculator.com/

https://www.pcwdld.com/subnet-mask-cheat-sheet-guide

Graphical Integrity

"The representation of numbers, as physically measured on the surface of the graphic itself, should be directly proportional to the quantities represented." -Edward Tufte

It is amazing how easy it is to find highly inaccurate and misleading data graphics and charts even in this year 2019. These inaccuracies and sometimes outright perversions of the truth are of particular concern to an insta-culture who gets its news in headlines, memes, charts and other bite-sized generalizations via social media and rarely looks for the evidence beyond the headlines and the source data behind the charts.

The “Lie Factor”, first defined by American statistician Edward Tufte is defined as "a value to describe the relation between the size of effect shown in a graphic and the size of effect shown in the data." A larger Lie Factor value indicates a higher level of deception or "inaccurate scaling/weighting".


Lie Factor in Action:

The numbers do not equate to the scale of the bars and money bags... not quite as "strong" as projected.




This example mixes 2 different scales and data sets and only serves to confuse the reader...




This is a propaganda data graphic displaying a series of 5 increases using a totally nonsensical scale



This graphic shows Last Year, Last Week, and Current Week as having the same temporal scale.... O'Lie Factor.




Lie Factor Breakdown:

Lie Factor is the change shown in the graphic (say 100%) divided by the change reported in the data (say "50%") - (100/50 = a LF of 2)


There are reasons for misleading graphics that go beyond propaganda and sensationalist news articles:

  • Lack of quantitative skills on the part of the graphic creator and publication editor
  • Doctrine that statistics are boring and therefor need to be "jazzed up"
  • Doctrine that graphics are only for unsophisticated and so don't need "accuracy constraints"
  • Failure to treat graphics with the same fidelity to the truth as the written word it accompanies

Other ways that graphical information displays are corrupted include cherry-picking data, making small changes appear large by showing a small scale interval and when all else fails for information manipulators- using fake data.

It is important to not jump to conclusions when assessing graphical information displays even if it is coming from a reputable publisher. As you can see it is not always obvious that the information being communicated graphically is accurate. Wherever possible, get a look at the source data.

"When we see a chart or diagram, we generally interpret its appearance as a sincere desire on the part of the author to inform. In the face of this sincerity, the misuse of graphical material is a perversion of communication, equivalent to putting up a detour sign that leads to an abyss" - Wainer


References:

https://viz.wtf/

https://infovis-wiki.net/wiki/Lie_Factor


Google Maps API

The Google Maps API is a very powerful tool that is relatively easy to use if you have some JavaScript background. The below screen was created with the code that follows (Google API Key obfuscated).


Basic Maps API example with hard-coded lat/long custom markers


Get an API key here: https://developers.google.com/maps/documentation/javascript/get-api-key

Maps API exposed operations reference: https://developers.google.com/maps/documentation/javascript/tutorial

 <!DOCTYPE html>  
 <html>  
  <head>  
   <title>Custom Markers</title>  
   <meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=no">  
   <meta charset="utf-8">  
   <style>  
    /* Always set the map height explicitly to define the size of the div  
     * element that contains the map. */  
    #map {  
     height: 100%;  
    }  
    /* Optional: Makes the sample page fill the window. */  
    html, body {  
     height: 100%;  
     margin: 0;  
     padding: 0;  
    }  
   </style>  
  </head>  
  <body>  
  <div>  
  Debug/Inspect...  
  </div>  
   <div id="map"></div>  
   <script>  
    var map;  
    function initMap() {  
     map = new google.maps.Map(  
       document.getElementById('map'),  
       {center: new google.maps.LatLng(43.0589, -88.0988), zoom: 11, mapTypeId: 'hybrid'});  
     var icons = {  
      bucks: {  
       icon: 'bucks.png'  
      },  
      brewers: {  
       icon: 'brewers.png'  
      },  
      panthers: {  
       icon: 'panthers.jpg'  
      }  
     };  
     var features = [  
      {  
       position: new google.maps.LatLng(43.0280, -87.9712),  
       type: 'brewers'  
      }, {  
       position: new google.maps.LatLng(42.9930, -87.9210),  
       type: 'panthers'  
      }, {  
       position: new google.maps.LatLng(43.0000, -87.8379),  
       type: 'bucks'  
      }  
     ];  
     // Create markers.  
     for (var i = 0; i < features.length; i++) {  
      var marker = new google.maps.Marker({  
       position: features[i].position,  
       icon: icons[features[i].type].icon,  
       map: map  
      });  
     };  
           //Must have API with access to the Places API to get marker click details without much manual code  
    }  
   </script>  
   <script async defer src="https://maps.googleapis.com/maps/api/js?key=XXXXXXXXXXXXXXXXXXXX&callback=initMap">  
   </script>  
  </body>  
 </html>  

Value vs. Reference Type Memory Allocation

I once in the not-too-distant past had a Developer interview go comically off the rails when I (the interviewee) was posed with the simple software development question:

"Describe the difference between a value type variable and a reference type variable"

This is an elementary Comp Sci 101 question that every developer should be able to understand and answer with unflinching confidence.

This is what value type (int a left) and reference type (object a on the right) memory allocation looks like graphically


The correct answer would point out that value type variables ('primitive' in Java) are stored directly in memory within the Stack while reference type variables contain pointers (id) to another memory location in the Heap (which may in turn contain additional memory references (pointers) that compose the entire object).

And (no I swear really) at one point I knew exactly what the difference was. However with time and as development languages and tools become more and more abstracted away from the underlying behavior of memory during application run-time, my knowledge of this simple-yet-important concept went blank.

I stammered though a meandering non-answer about how value types contain "simple" data types that are known as primitives and how reference types refer to more complex objects. Suffice it to say that it was an embarrassing lack of clarity and awareness of how application memory works and likely torpedoed any chance of an offer.


On value and reference types, the Stack and the Heap and Boxing/Unboxing (converting between val and ref)


This is something we should all know as developers. I hope this helped refresh and/or clarify the concept of variable memory allocation for at least a few people out there.

ETL and EDI Using SSIS

ETL is the process by which you can take (Extract) data from various (usually related) data sources, Transform that data to meet your destination system's needs, and finally Load that transformed data into the destination system data store.

Your table structure will be something along the lines of this basic template:

In a real-world db environment Staging, OLAP, OLTP and other data repos may be on different database servers, this is same db server for demonstration

We will use SQL Server Integration Services (SSIS) and develop the SSIS package within the Visual Studio 2017 IDE.


The first step of the SSIS package load (INSERT) the data into a STAGING area database. This allows us to:
  • Store off the intermediate data from all sources into analysis-friendly OLAP datastores
  • Perform data integrity checks
  • Keep extraction and transformation as two strictly separated steps

We load the data from the various source files (.csv, .xls, .xlsx) into SQL Server database Staging table(s) using SSIS Source and Destination Data Flow Tasks connected with the movable data flow arrows. Once you have connected a source and destination you can go into the Destination Data Flow Task and edit the mappings of which source columns should be written to which destination columns.

Next we perform some transformations. This can be anything from a simple ranking or status/flagging/business prioritization algorithm to data cleansing to data partitioning based on certain criteria; the key is that this Transform step is where we apply T-SQL UPDATEs to transform the data once it has all been aggregated in Staging.

Then we refresh the OLAP destination tables using the same kind of Source and Destination Data Flow Tasks and mappings as used for Staging. The OLAP data is used for data analysis.

Finally, we load the cleansed Staging data into our destination system's OLTP database and email or text message the system owner upon successful completion of the SSIS ETL job (or deliver an error if anything fails). The OLTP data stores live transactions.

Bear in mind that most ETL data-flow step mappings are not a 1:1 match; this is just an e2e demo of SSIS ETL in most basic form


Happy ETL'ing, and be sure to watch out for cases of mysterious symbols/characters from miscellaneous data copied from other programs or from other system environments that were using a Language setting (codepage) which is incompatible with your ETL software. Bad data happens more than you think and as we say, GIGO.

Your end result looks like this (all Green Checkmarks indicates all was successful; I recommend using PaperCut for SMTP testing- super cool and useful product

I would attach or GitHub the source code (and will do so upon request) but SSIS project code has a lot of dependencies and can get quite messy for another to re-use project on even just a 'slightly different' machine.

Having used SSIS' now-deprecated predecessor "DTS" (SQL Server Data Transformation Services) and SSIS for many years I can attest to the fact that the best way to learn this product is by diving right in on your own and begin the creation of sources and destinations and source/destination connection managers and control flow events, and .NET integration, and exception event handlers, etc.

You will likely run into some ambiguous and not well-documented errors when developing in SSIS; but persist in your efforts and you can create a very powerful EDI system with the many capabilities of SSIS and the robust Scheduled ETL jobs that it can create.

Java from .NET and .NET from Java

Java or .NET? Why not both (when it is the only viable path)?

jni4net is a proven interop library for Java and .NET. Two brief examples developed by jni4net below merely require that you to specify the jni4net dependency in the (Visual Studio or Eclipse) project.

Calling Java from .NET
 using java.io;  
 using java.lang;  
 using java.util;  
 using net.sf.jni4net;  
 using net.sf.jni4net.adaptors;  
 namespace helloWorldFromCLR  
 {  
   public class Program  
   {  
     private static void Main()  
     {  
       // create bridge, with default setup  
       // it will lookup jni4net.j.jar next to jni4net.n.dll  
       Bridge.CreateJVM(new BridgeSetup(){Verbose=true});  
       // here you go!  
       java.lang.System.@out.println("Hello Java world!");  
       // OK, simple hello is boring, let's play with Java properties  
       // they are Hashtable realy  
       Properties javaSystemProperties = java.lang.System.getProperties();  
       // let's enumerate all keys.   
       // We use Adapt helper to convert enumeration from java o .NET  
       foreach (java.lang.String key in Adapt.Enumeration(javaSystemProperties.keys()))  
       {  
         java.lang.System.@out.print(key);  
         // this is automatic conversion of CLR string to java.lang.String  
         java.lang.System.@out.print(" : ");  
         // we use the hashtable  
         Object value = javaSystemProperties.get(key);  
         // and this is CLR ToString() redirected to Java toString() method  
         string valueToString = value.ToString();  
         java.lang.System.@out.println(valueToString);  
       }  
       // Java output is really Stream  
       PrintStream stream = java.lang.System.@out;  
       // it implements java.io.Flushable interface  
       Flushable flushable = stream;  
       flushable.flush();  
     }  
   }  
 }  


Calling .NET from Java
 import net.sf.jni4net.Bridge;  
 import java.io.IOException;  
 import java.lang.String;  
 import system.*;  
 import system.Object;  
 import system.io.TextWriter;  
 import system.collections.IDictionary;  
 import system.collections.IEnumerator;  

 public class Program {  
      public static void main(String[] args) throws IOException {  
           // create bridge, with default setup  
           // it will lookup jni4net.n.dll next to jni4net.j.jar   
           Bridge.setVerbose(true);  
           Bridge.init();  
           // here you go!  
           Console.WriteLine("Hello .NET world!\n");  
           // OK, simple hello is boring, let's play with System.Environment  
           // they are Hashtable realy  
           final IDictionary variables = system.Environment.GetEnvironmentVariables();  
           // let's enumerate all keys  
           final IEnumerator keys = variables.getKeys().GetEnumerator();  
           while (keys.MoveNext()) {  
                // there hash table is not generic and returns system.Object  
                // but we know is should be system.String, so we could cast  
                final system.String key = (system.String) keys.getCurrent();  
                Console.Write(key);  
                // this is automatic conversion of JVM string to system.String  
                Console.Write(" : ");  
                // we use the hashtable  
                Object value = variables.getItem(key);  
                // and this is JVM toString() redirected to CLR ToString() method  
                String valueToString = value.toString();  
                Console.WriteLine(valueToString);  
           }  
           // Console output is really TextWriter on stream  
           final TextWriter writer = Console.getOut();  
           writer.Flush();  
      }  
 }  
(verbose commenting by Pavel Savara, a jni4net contributor)

References:

http://zamboch.blogspot.com/2009/10/how-calling-from-net-to-java-works.html

http://zamboch.blogspot.com/2009/11/how-calling-from-java-to-net-works-in.html

https://github.com/jni4net/jni4net/tree/master/content/samples

SQL CLR for .NET in SQL Server

You may find yourself with the need to integrate a .NET method within SQL Server to be called as a function. This usually happens when some relatively complex looping and modifying logic is a requirement of a SQL operation.

SQL is a great data language but it is not the right language for some tasks. Creating a SQL CLR from a .NET assembly may be the best approach to some unique situations (and there is a bonus in that, in many cases you can reuse existing .NET code).

Before creating the CLR object we need a .NET .dll; so first we create a basic .NET assembly compile in Release and copy the path the the compiled .dll:

This is our simple .NET CLR method with which we want to run within the SQL Server query execution engine


SQL CLR provides a way for you to integrate complex .NET methods within SQL Server


Import into SQL Server instance via SSMS*: 


Select New Assembly... 




...and then enter the path to your Release .dll


Create T-SQL function or stored procedure to serve as caller for the function and run it:

From here we can see all of the T-SQL code involved; the 3 SQL Server configuration conditions (shown in the 3 EXEC statements) are required

And that is all there is to it. Only use CLR functions when absolutely necessary as RDBMS's like SQL Server are designed to processes relational data in sets, and not to apply complex business logic on individual rows.

But if there is no other way- SQL CLRs could provide you a solution to your code/logic integration problems.


*Warning and Reference: https://blog.netspi.com/attacking-sql-server-clr-assemblies/

Calling Win32 API from .NET C# Application

As quoted in the useful reference below:
"Anybody doing any serious Windows development in C# will probably end up calling many Win32 functions. The .NET framework just doesn't cover 100% of the Win32 API." Mike Thompson

This illustrative example here is simply to show what the integrated code looks like; however identifying available drive space is a common app requirement

The interop of .NET and Win32API works by way of referencing the InteropServices .NET namespace and using normal Win32 API functions with the [DllImport()] attribute denoting the Win32 API assembly being used and the corresponding function being modified as "static extern" which informs the compiler that the function is calling unmanaged (non-.NET) code.

 using System;  
 using System.Runtime.InteropServices;  
 namespace ConsoleApp1  
 {  
   internal static class Win32  
   {  
     [DllImport("kernel32.dll", CharSet = CharSet.Auto, SetLastError = true)]  
     internal static extern bool GetDiskFreeSpaceEx(string drive, out long freeBytesForUser, out long totalBytes, out long freeBytes);  
   }  
   class Program  
   {  
     static void Main(string[] args)  
     {  
       long freeBytesForUser;  
       long totalBytes;  
       long freeBytes;  
       Console.WriteLine("Free space in this directory:");  
       if (Win32.GetDiskFreeSpaceEx(@"C:\", out freeBytesForUser, out totalBytes, out freeBytes))  
       {  
         Console.WriteLine("Free user bytes: " + freeBytesForUser.ToString());  
         Console.WriteLine("Free total bytes: " + totalBytes.ToString());  
         Console.WriteLine("Free bytes: " + freeBytes.ToString());  
       }  
       Console.ReadLine();  
     }  
   }  
 }  

Reference: https://stackoverflow.com/questions/137255/how-can-i-determine-if-a-remote-drive-has-enough-space-to-write-a-file-using-c