What Is Apache TinkerPop?
What is Apache TinkerPop?
Apache TinkerPop is a framework for graph computing.
It is designed as an abstraction layer over individual graph databases. Its goal is to provide a common way to operate a wide range of graph databases.
It already supports various cloud-based and open-source databases, including Neo4j, JanusGraph, Amazon Neptune, and Azure Cosmos DB.
The framework broadly consists of the following components:
- A Java-based Core API
- Gremlin, a graph traversal language
- Gremlin Server, which exchanges queries and data with clients
Graph traversal language: Gremlin
Gremlin is a language specialized for manipulating graph data structures. It serves a role similar to SQL in relational databases.
Gremlin makes it possible to write concise queries. Graphs can also be represented in an RDBMS, but expressing them in SQL often requires many JOINs and produces complex queries. Gremlin instead connects operations through method chains without JOINs, which improves readability.
The following compares a simple SQL query containing a JOIN with its Gremlin equivalent. Note that this is not a perfectly equivalent comparison because the data models differ.
SELECT Products.ProductName
FROM Products
INNER JOIN Categories
ON Categories.CategoryID = Products.CategoryID
WHERE Categories.CategoryName = 'Beverages'
g.V().has("name","Beverages").in("inCategory").values("name")
Gremlin libraries by language
Official libraries let applications written in Python, JavaScript, and other languages define Gremlin traversals.
- http://tinkerpop.apache.org/docs/current/reference/#gremlin-python
- http://tinkerpop.apache.org/docs/current/reference/#gremlin-javascript
- http://tinkerpop.apache.org/docs/current/reference/#gremlin-dotnet
Libraries for each programming language, such as Gremlin-Python and Gremlin-JavaScript, generally generate bytecode from traversals written in that language and send it to the JVM.
Relationship with Amazon Neptune
Amazon Neptune supports both Gremlin and SPARQL for queries. Gremlin will often be the natural choice. SPARQL resembles SQL in both its name and syntax.
SELECT DISTINCT ?name
WHERE {
?person v:label "person" .
?person v:age ?age .
?person e:created ?project .
?project v:name ?name .
?project v:lang ?lang .
FILTER (?age > 30 && ?lang == "java")
}
Gremlin is based on the property graph model of nodes, edges, and properties, whereas SPARQL is based on the triple-store model of subject, predicate, and object. Conceptually, Gremlin can therefore feel closer to graph theory.