Showing posts with label CQL. Show all posts
Showing posts with label CQL. Show all posts

Neo4j Graphs: Creating and Querying Edges and Nodes

First it is helpful to understand the basic premise behind Graph Theory, which is the foundation of these "Pairwise relations between objects" that we can store and explore in graph databases. The mathematical representation is the formula:

A graph (G) is equal to the connections of its entity nodes aka "vertices" (V) with its relationship edges (E)


Swiss mathematician Leonhard Eueler was looking for a solution to a puzzle on the relationships between land masses and bridges in the Prussian city of Konigsberg. This was the genesis of Graph Theory:

"The geometry of position, now known as Graph theory"

Graph database technology is useful for analyzing complex relationships and relationship behaviors in myriad scenarios not limited to:
  • Computer Networking
  • Spread of Gossip
  • Fraud Detection
  • Forensic Investigations
  • Flight Mapping/GPS
  • Population Growth
  • Spread of Infectious Disease
  • Hierarchy Visualization

It is also vital component for the analysis of (increasingly) unstructured data. By unstructured we typically mean data that is not easily modeled in a traditional relational hierarchy but may still have common properties and so still have relationship value. It is estimated that around 80-90% of an organization's data is unstructured.

Example of a couple simple graph relationships

Relationships in graph data are formed by the edge tables which signify a relationship between two different entities or "nodes" which are stored in node tables. Instead of RDBMS FK/PK and other check constraints, the relationships are defined in the edge tables; all of the properties that one would want to query via CQL to find things are stored in JSON metadata inside the node tables instead of the columns-for-each-property that defines relational data architecture.

Node Tables: Represent a data entity

Edge Tables: Store relationships between nodes

For the RDBMS purists and skeptics out there, I recommend this quote about Node, Graph and other alternative data processing paradigms vs. the traditional RDBMS OLTP and OLAP models:
"The NOSQL acronym is: Not Only S Q L. NOSQL solutions are not a replacement or successor for RDMBS systems, nor were they ever intended to be. They are useful tools to be used for specific purposes. They are to be used as part of an organisation’s data management solution and not as a total replacement for the existing solution". -Simon Munro
It is important to note that graph data can be queried in much the same way as SQL through a graph-specific query language called CQL (cipher query language) as you will see in the following demonstration.

For a quick demonstration on how easy it is to get up and running with this technology we will be using Neo4j which you can download here: https://neo4j.com/download/


Walkthrough of Neo4j:
Fire up the Neo4j Desktop client, click "New" and then click the "Add Graph" button to create your first graph database. Once the db has been created, click the play icon to start it up (it must be running for Neo4j browser to connect).

This is the Neo4j main home screen from which you can create and connect to graph databases and projects

Then click the "Neo4j Browser" button to launch the graph data browser for creation and visual exploration of graph data relationships.

Next, in the Neo4j browser, click "Jump into Code" and simply follow the prompts to begin creation and querying of graph data.

Once, in the Neo4j tutorial, you can simply follow the prompted instruction widgets.

I would continue with instructions but the rest is refreshingly self-explanatory. You first create a movie database that contains actors, directors and movies stored in Node (entity) tables and the types of relationships between these Nodes stored in Edge (relationship type) tables.

One thing to note is that to execute a CQL command in the browser, you must hold down CTRL+Enter. Alternatively you can click the play button to execute CQL.

As you continue with the tutorial you will wind up with something like this when you get to the step that has you execute CQL to find all Node entities that are within 4 degrees (or node "hops") of the actor Kevin Bacon:

4 degrees of separation from Kevin Bacon...

Once you get comfortable with CQL syntax it is relatively easy to start modeling and creating your own graph database structures which can help you and/or your company to analyze some of the unstructured and semi-structured data that is hard to extract value from with traditional RDBMS.

Bigtime kudos to the Neo4j team on making this so straightforward and simple to learn and get up
and running with a new technology so fast. I've never seen a technology tutorial like it.

As you can see, there is tremendous potential value in exploring data relationships that don't necessarily fit neatly into traditional RDBMS/hierarchical databases but are no less useful a tool to have in an organization's data analysis arsenal.


References:

https://www.mssqltips.com/sqlservertip/5007/sql-server-2017-graph-database-query-examples/

https://www.youtube.com/watch?v=gXgEDyodOJU

https://www.red-gate.com/simple-talk/sql/t-sql-programming/experiments-with-neo4j-using-a-graph-database-as-a-sql-server-metadata-hub/

https://www.youtube.com/watch?v=mVWn8k49mAQ