Showing posts with label Oracle. Show all posts
Showing posts with label Oracle. Show all posts

Why NoSQL is Never Going to Replace SQL (apple:orange)

Do you need a jet engine to get your results from point A to B or will a modest GM sedan suffice?

I'll take the above bad analogy further and posit that while sedans and cars on the ground require stringent rules and have to navigate much more rigid structures, a jet engine simply powers the jet ahead through constraint-less skies- it's purpose is to power something big- not to be concerned with other machinery of the craft (ie. RDBMS features eschewed by NoSQL solutions).

Do you need massive global data sync scale so that millions can connect and make changes and the results all (appear) real-time? If not, NoSQL is not always the right choice and neglecting to have any kind of schema for stored application data structures can present its own host of challenges in the future if (when) those structures change. But alas, you can use NoSQL for some things (Redis, image/BLOB storage) and an RDBMS for others (more structured records and things you want to restore to a point in time in the event of a server failure).

Relational databases tend to be scaled up; NoSQL solutions are scaled out

The SQL vs NoSQL (structured and transactional vs. semi-structured and "eventually consistent") debate is not a matter of one or the other and that's that. These are complementary technologies and should both be used- wherever you find app requirements that suggest one or the other makes the most sense.

When an application is meant to scale immensely and there is not a lot of data integrity, consistency, transaction or complex data structuring and transformation needs- NoSQL is your best bet and will far outscale even the most robust RDBMS server farm- at least at a much lower cost (at the cost of sacrificing features of an RDBMS which may not be needed).

I have personally worked on several projects that utilize relational and unstructured approaches to reading and persisting application data. If you have ever used an application's config file to change a setting in JSON or XML or a simple line entry- you are seeing a small and very basic NoSQL example of storing app data.

Hadoop and other distributed NoSQL db servers are built for scalability


Using NoSQL in software development can make data structures and objects- passed to and fro from APIs and within the application itself- much more flexible to work with. 1-line to serialize an object to JSON chunk, save it to BLOB storage and forget about it.

When dealing with relational data, you really have to understand the data to write good data access code and the underlying SQL that supports well-defined structuring of complex objects.

Well-defined structuring of the persistence of complex application objects avoids data duplication/corruption, prevents breaking reference constraints and losing any sense of hierarchical data relationships and more generally lets you know very quickly when you have a problem within your data storage structures and the objects that initialize themselves from that data.

NoSQL ditches virtually all relational database data normalization rules in favor of a loosely schema'd unstructured (document, BLOB, KeyStore, etc.) data store that relies solely on keys, values and filtering unstructured metadata to get the same SELECT ... WHERE functionality found in RDBMS. Its iterations usually bear a resemblance to Java and as loosely follows here are common SQL statements and their Java or Java-derived equivalent:


"Apache Hadoop is an open source platform built on two technologies: Linux operating system and Java programming language."


For many application data requirements however, relational data can be overkill and totally unnecessary (ie. Redis to store key/vals vs. designing some elaborate key/val store in a SQL Server table).


Implementing something like Splunk or Kibana to continously index app logs and configuration files can help you dip toes into the Lake of Dark Data


The best distributed NoSQL solutions like Hadoop really shine in their inherit ability to dynamically scale to as many server machines as the operators can make ready to serve as "Hadoop processor nodes on standby".

SQL Server scaling is based more on server augmenting or "scaling up" (adding RAM, faster SSDs, RAID Arrays, etc.) rather than distributing workloads across dynamic nodes. SQL Server AlwaysOn Availability and its Mirroring and Replication feature are for recoverability and data sharing- not dynamic scaling to handle bigger and bigger workloads.

SQL has been around forever. The fundamental concept behind NoSQL (semi-structured or loosely structured data) has been around since long before SQL relational database technology. Both (along with NoSQL-related graph database paradigm) will continue to serve as viable data storage solution alternatives for many more years into the 21st century.


Relational systems like SQL Server, MySQL and Oracle usually handle structured side; NoSQL vendors are after the other 90%


In fact, SQL Server 2019's Polybase extension supports Hadoop Clusters, MongoDB and Terradata T-SQL query integration. A new feature called SQL Server Big Data Clusters helps make distributed NoSQL nodes manageable within SSMS environment.

Mongo, Hadoop and other NoSQL database servers have SQL server integration to support relational data sources.

SQL Server 2019 Polybase integrates Hadoop, MongoDB and many other sources with relational data and T-SQL queries


CAP Theorem: a distributed data system like most all NoSQL solutions can only achieve 2 of the 3 features: "Consistency", "Availability" and "Partition Tolerance"

ACID vs BASE:  The relational axiom of "Atomic, Consistent, Isolated, Durable" contrasted against NoSQL's vague promise of "Basically Availability, Soft State, Eventual Consistency" (dirty reads common)

This ol' tried-and-true database server software ain't going away in the foreseeable future


Hundreds of millions of corporate, mid and small business applications are running along just fine in 2019 using various RDBMS platforms (SQL Server, Oracle, DB2, PostgreSQL, MySQL, etc.) for at least one of their data stores.

Many more millions of applications have been using one riff or another of NoSQL (semi-structured data) before, during and after the mythical "Relational Movement" as described by software veteran Robin Bloor:

"The Relational Model of Data Never Dominated Anyway. Estimates vary, but it is generally agreed that somewhere between 70% and 95% of the world’s data is stored only in poorly structured or unstructured formats such as: word processing documents, spreadsheets, HTML files and e-mail. The truth is that Relational database never did really dominate. It was rejected out of hand, year after year, as an effective store for many types of data." -Robin Bloor on insideanalysis.com


Google search trends over the last 5yrs certainly suggest relational SQL is not going anywhere anytime soon...


Considerations when evaluating whether to use NoSQL:
  • NoSQL is a precise tool for precise data needs; if relational SQL is too much for your group, NoSQL will likely be too steep a learning curve
  • Data Integrity- when billions of NoSQL records are affected by a small change in schema that is not able to propagate correctly or runs into constraint issues or hierachy and relations are impossible to infer... maybe relational SQL would be a better approach
  • NoSQL touts loose schema structure is a benefit but this simply means schema and data structure enforcement has been shifted from the database layer to the application layer. Data cannot "self-manage".
  • Some apps are prime candidates for NoSQL's document-centric and resource-centric distributed storage architecture


Also, there is this to consider:


(re: the longevity and simple-yet-powerful abstractions of SQL)


If NoSQL solutions are eventually able to achieve the same transactional consistency and complex schema structures that some applications require and then ultimately subsume RDBMS completely- it'll still require a lot of SQL gurus to convert and integrate all the legacy relational database apps for a long, long time to come...

Bring on MongoDB, CouchDB, Dynamo, MapReduce, HBase, BigTable, Cassandra.


As data professionals we will have an increasingly complex array of tools to understand; what we do with them will drive the future



Long live SQL Server 2030. 😉




References:

https://blog.timescale.com/why-sql-beating-nosql-what-this-means-for-future-of-data-time-series-database-348b777b847a/

https://www.wired.com/2012/12/couchdb/

https://pdfs.semanticscholar.org/a6f0/1c9103d3bafb8ce92641c9f2a4deaccd12f9.pdf

https://www.memsql.com/blog/why-nosql-databases-wrong-tool-for-modern-application/

https://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:1589423200346982646

https://news.ycombinator.com/item?id=479165

https://insideanalysis.com/is-the-relational-database-doomed/

SQL NULL cannot be applied to arithmetic operations (select NULL > 1 = NULL)


"Any arithmetic expression containing a null always evaluates to null. For example, null added to 10 is null." -http://docs.oracle.com/cd/E11882_01/server.112/e26088/sql_elements005.htm#i59110

In SQL Server, this behavior can result in string concatenation failing due to one of the values being NULL, which, as is NULL's nature, turns the entire variable into NULL.



....results in: NULL

Two easy ways to work around* the inability to evaluate NULL are as follows:
 (COALESCE takes first non-NULL value specified whereas ISNULL takes just one alternate value)

Different database platforms have different rules for NULL but arithmetic ops will not work in any, as NULL is only ever NULL, and NULL is an unknown. If you are checking for the existence of data, you should be using:


select top 5 m.Column1 FROM MYTABLE m WHERE m.Column1 IS NOT NULL


*you can do this in lieu of not getting bad data in the 1st place, obviously- if the column in question needs to hold a legitimate numeric value for calculation.. your app probably should not be generating and storing null values into that column.