Neo4j Graph Database

Graph databases are used to manage and understand how different pieces of data are connected. Unlike relational databases, which use tables, graph databases use a system of nodes and edges to show relationships. Big companies like Facebook, LinkedIn, and Netflix use these databases to keep track of user's social networks and get recommendations based on relationships. As our world gets more connected, graph databases help us make sense of all the complex data more quickly and easily. In this blog post it will be about the Neo4j graph database.

Jordan Wu profile picture

Jordan Wu

6 min·Posted 

Anaheim Hills, Anaheim, United States Sunset Image.
Anaheim Hills, Anaheim, United States Sunset Image.
Table of Contents

What is Neo4j?

Neo4j is a popular graph database that stores nodes and relationships. It's a great solution when storing users' social networks and for getting recommendations. The world can be represented by objects. Each object can be related to other objects. This relationship between objects is much easier to understand in a graph. Other databases like SQL and NoSQL aren't good at storing graph data. It's possible to store them, but the complexity and performance is not great. Neo4j is a graph database used to store graph data.

What is a Graph Data Model?

A graph has two fundamental components, Nodes and Relationships.

Nodes are often used to represent entities that can contain properties that hold name-value pairs of data. Each node can have one or more labels to help group them. A label is a named graph construct that is used to group nodes into sets. All nodes labeled with the same label belong to the same set. The naming convention for the label is camel-case, beginning with an upper-case character like Person, Actor, and Director.

A relationship connects two nodes and allows us to find related nodes of data. Just like nodes it can contain properties that hold name-value pairs of data. It has a source node and a target node that shows the direction of the arrow. Although you must store a relationship in a particular direction. A relationship will always connect two nodes and you cannot delete a node without also deleting its associated relationships. The naming convention for the relationship is to use a verb and to upper-case, using underscore to separate words like ACTED_IN and DIRECTED.

Movie Graph Data Model
Movie Graph Data Model

The graph represents people and their roles in movies. The nodes are represented in a circle with labels and have properties. The relationships connect two nodes with a direction and have properties.

Simplified Movie Graph Data Model
Simplified Movie Graph Data Model

Here's a simplified graph that is easier to understand. Looking at the graph you can tell which actors acted in a movie and who were the directors.

Cypher Query Language

The graph query language that Neo4j uses is called Cypher. This is used to perform actions to the graph database like read, write, update, and delete. Cypher is unique because it provides a visual way of matching patterns and relationships. It lets users focus on what to retrieve from a graph, rather than how to retrieve it and consists of three core entities: nodes, relationships, and paths.

Cypher Query Language Example
Cypher Query Language Example

In the example is an ASCII-art type of syntax where (nodes)-[:ARE_CONNECTED_TO]->(otherNodes) using rounded brackets for circular (nodes), and -[:ARROWS]-> for relationships. It follows the follow common syntax:

// Node syntax
()
(matrix)
(:Movie)
(matrix:Movie)
(matrix:Movie {title: 'The Matrix'})
(matrix:Movie {title: 'The Matrix', released: 1997})

// Relationship syntax
-->
-[role]->
-[:ACTED_IN]->
-[role:ACTED_IN]->
-[role:ACTED_IN {roles: ['Neo']}]->

// Pattern syntax
(keanu:Person:Actor {name: 'Keanu Reeves'})-[role:ACTED_IN {roles: ['Neo']}]->(matrix:Movie {title: 'The Matrix'})

// Pattern variable
acted_in = (:Person)-[:ACTED_IN]->(:Movie)

Nodes

The data entities in a Neo4j graph database are called nodes. Nodes are referred to in Cypher using parenthesis ().

MATCH (n:Person {name:'Anna'})
RETURN n.born AS birthYear

In this example, the first MATCH clause finds all Person nodes in the graph with the name property set to Anna, and binds them to the variable n. The variable n is then passed along to the subsequent RETURN clause, which returns the value of a different property born belonging to the same node.

Relationships

Nodes in a graph can be connected with relationships. A relationship must have a start node, an end node, and exactly one type. Relationships are represented in Cypher with arrows --> indicating the direction of a relationship.

MATCH (:Person {name: 'Anna'})-[r:KNOWS WHERE r.since < 2020]->(friend:Person)
RETURN count(r) As numberOfFriends

The query example above matches for relationships of type KNOWS and with the property since set to less than 2020. The query also requires the relationships to go from a Person node named Anna to any other Person nodes, referred to as friend. The count() function is used in the RETURN clause to count all the relationships bound by the r variable in the preceding MATCH clause (i.e. how many friends Anna has known since before 2020).

Paths

Paths in a graph consist of connected nodes and relationships. Exploring these paths sits at the very core of Cypher.

MATCH (n:Person {name: 'Anna'})-[:KNOWS]-{1,5}(friend:Person WHERE n.born < friend.born)
RETURN DISTINCT friend.name AS olderConnections

This example find all paths up to 5 hops away, traversing only relationships of type KNOWS from the start node Anna to other older Person nodes (as defined by the WHERE clause). The DISTINCT operator is used to ensure that the RETURN clause only returns unique nodes.

MATCH p=shortestPath((:Person {name: 'Anna'})-[:KNOWS*1..10]-(:Person {nationality: 'Canadian'}))
RETURN p

Paths can also be assigned variables. For example, the query binds a whole path pattern, which matches the shortest path from Anna to another Person node in the graph up to 10 hops away with the nationality property set to Canadian. In this case, the RETURN clause returns the full path between the two nodes.

Summary

A graph database is a powerful tool to help related data. Relationships play a critical part in our daily lives from people we know to things we love. Storing the data in a graph will help connect everything together. AI is the future and with Neo4j new Generative AI features you can start building the next generation apps that perform tasks based on the user's relationship to things in the world.

About the Author

Jordan Wu profile picture
Jordan is a full stack engineer with years of experience working at startups. He enjoys learning about software development and building something people want. What makes him happy is music. He is passionate about finding music and is an aspiring DJ. He wants to create his own music and in the process of finding is own sound.
Email icon image
Stay up to date

Get notified when I publish something new, and unsubscribe at any time.