Polysemy in graph databases

How to choose your graph database wisely

Manu Cohen-Yashar
4 min readApr 16, 2020

Graph databases are not new. The web is full of great content that describes in detail all you need to know about these powerful types of databases. Unfortunately, the term “graph databases” is a little confusing because, under the roof of that one concept live two different types of products, property graphs, and knowledge graphs, which are fundamentally different. Surprisingly whenever I speak with a colleague who is not specifically an expert about graph databases, I find out that most times that my conversation partner fails to do this basic distinction. Too often, that simple mistake leads to expensive misuse of a database, so I decided to write this article to ensure that you will not fall into this trap and not make that simple, yet expensive basic mistake.

produced based on graphics from Neo4J and Audiopedia

So what is under the concept graph database?

If you google “graph database” most probably, you will find information about property-graphs; therefore, let’s consider it as the default type of graph database. Property-graphs databases put the attention on the graph itself, which means that what you most care about is how entities are connected to each other and what is the immediate insights you can learn from that connectivity. We use these types of graphs every day. For example, the subway map is a property graph. Stations are connected, and so by looking at the map, these connections tell us how to get from point A to point B. The fact that the graph (e.g., the direct connectivity) is on the front row makes mathematical graph operations such as: “find the shortest path”, native for property graphs use cases.

The knowledge graph is the less popular citizen under the term “graph database”, but it’s by no means less important. Here we care less about the direct connectivity between elements, but rather we are more interested in the business meaning of the data we store. We begin by defining an ontology, which is a machine-readable description (i.e. definition) of the business domain we are working on. Then we add the data and apply the ontology to it. Finally, we use the knowledge graph to infer new meanings and insights from the data and the ontology combined. The graph is only a representation method for the relations that make up the meanings behind the data, but it is not by itself the focal point as in property graphs. Let’s consider a simple naïve example: Let’s take the domain: Family and create an ontology that defines the concepts and their relations in the domain (Father, Son, Grandfather, etc.). Now we can insert specific entities for Bob, Marry, and George and, based on the ontology specify that Bob is the father of Marry, and Marry is the mother of George. Even if we do not specify any relation between George and Bob, the knowledge graph can infer that Bob is George’s grandfather.

A knowledge graph is a great tool for data interoperability and computation of deep, complex, and dynamic business insights. Data entities (e.g., Customer Data) might have different representations coming from different origins but eventually, have the same business meaning. With an ontology, you can define what the concept ‘Customer’ means in your business and then cross customer data originated from different systems. With a knowledge graph databases, you can infer relations even if they do explicitly exist in the data based on the ontology. Unlike static OLAP analysis executed on traditional data warehouse solutions that require schema definition and lots of data preparation before analysis can occur, with knowledge graphs, you can dynamically choose how to analyze your data and then find deep insights quickly with minimal data preparation.

Data is represented and queried differently in property graphs and knowledge graphs. Gremlins is the defacto standard for querying property graphs, and SPARQL is the query language for knowledge graphs. Gremlins queries typically include graph operations such as find the shortest path, while SPARQL queries will typically include semantic operations such as is-a or has-a. Often it is possible to execute graph operation on knowledge graph or compute some sort of semantics on property graphs but the nature of the backend engine is different, and the performance of such queries will be bad. That is why it’s so important to understand the nature of the problem before choosing a graph database.

When you think about graph databases, think first about the nature of the problem you are trying to solve. Is it a problem that requires “walking the graph” where the business value is in the direct connectivity between entities? In that case, the Property Graph is what you need. Or maybe the solution requires understanding the semantics of the data and inferring insights from it, in which case knowledge graph should drive your solution. Make sure you don’t fall into the trap of polysemy and choose your data infrastructure wisely.

If you want to learn more about Ontologies and Knowledge Graphs, I include here some introduction articles that will help you bootstrap your knowledge about it.

Introduction to semantic web

· What is Semantic Technology?
· Video: The basic idea behind the semantic web
· What Is the Semantic Web?

Introduction to ontologies

· What are Ontologies?
· Video: What is an ontology
· Video: Ontology intro

Introduction to knowledge graphs

· What is a Knowledge Graph?
· Video: What is a knowledge graph
· What is Inference?

Introduction to RDF graph databases

· What is RDF Triplestore?
· RDF 101
· What is SPARQL?
· Video: Introduction to one example of an RDF database.

--

--