What Is Apache TinkerPop?

An overview of Apache TinkerPop

What is Apache TinkerPop?

Apache TinkerPop is a framework for graph computing.

It is designed as an abstraction layer over individual graph databases. Its goal is to provide a common way to operate a wide range of graph databases.

It already supports various cloud-based and open-source databases, including Neo4j, JanusGraph, Amazon Neptune, and Azure Cosmos DB.

The framework broadly consists of the following components:

A Java-based Core API
Gremlin, a graph traversal language
Gremlin Server, which exchanges queries and data with clients

Graph traversal language: Gremlin

Gremlin is a language specialized for manipulating graph data structures. It serves a role similar to SQL in relational databases.

Gremlin makes it possible to write concise queries. Graphs can also be represented in an RDBMS, but expressing them in SQL often requires many JOINs and produces complex queries. Gremlin instead connects operations through method chains without JOINs, which improves readability.

The following compares a simple SQL query containing a JOIN with its Gremlin equivalent. Note that this is not a perfectly equivalent comparison because the data models differ.

    SELECT Products.ProductName
      FROM Products
INNER JOIN Categories
        ON Categories.CategoryID = Products.CategoryID
     WHERE Categories.CategoryName = 'Beverages'

g.V().has("name","Beverages").in("inCategory").values("name")

Gremlin libraries by language

Official libraries let applications written in Python, JavaScript, and other languages define Gremlin traversals.

Libraries for each programming language, such as Gremlin-Python and Gremlin-JavaScript, generally generate bytecode from traversals written in that language and send it to the JVM.

Relationship with Amazon Neptune

Amazon Neptune supports both Gremlin and SPARQL for queries. Gremlin will often be the natural choice. SPARQL resembles SQL in both its name and syntax.

SELECT DISTINCT ?name
WHERE {
  ?person v:label "person" .
  ?person v:age ?age .
  ?person e:created ?project .
  ?project v:name ?name .
  ?project v:lang ?lang .
    FILTER (?age > 30 && ?lang == "java")
}

Gremlin is based on the property graph model of nodes, edges, and properties, whereas SPARQL is based on the triple-store model of subject, predicate, and object. Conceptually, Gremlin can therefore feel closer to graph theory.