Graph Database Quick Start Guide
What is a graph database?
A graph database stores and processes graph structures. Relational databases model data as tables and can represent relationships, but doing so often requires careful schemas, complex queries, and expensive joins. Graph databases model connected data much like a whiteboard diagram. They are intuitive for connected data but less suitable for unrelated records.
Property graph model

- A graph contains nodes (vertices) and edges.
- Nodes have labels and key-value properties.
- Edges have labels, directions, start nodes, and end nodes.
- Edges can also have properties.
Try a graph database
Clone the air-route sample from Practical Gremlin and run TinkerGraph through Docker:
% git clone https://github.com/krlawrence/graph.git
% cd graph/sample-data
% docker run -it --rm -v `pwd`:/mydata tinkerpop/gremlin-console
gremlin> :load /mydata/load-air-routes-graph-34.groovy
Explore the graph with Gremlin
Gremlin traversals conventionally begin with g, the graph traversal source.
gremlin> g.V().count()
==>3619
gremlin> g.V().hasLabel('airport').count()
==>3374
gremlin> g.V().hasLabel('airport').has('code','ICN')
==>v[122]
gremlin> g.V().hasLabel('airport').has('code','ICN').valueMap()
==>[country:[KR],code:[ICN],city:[Seoul],icao:[RKSI],runways:[3]]
V() selects vertices, hasLabel filters labels, has filters properties, and valueMap returns properties.
Follow relationships
gremlin> g.V().hasLabel('airport').has('code','ICN').out('route').count()
==>144
gremlin> g.V().hasLabel('airport').has('code','ICN').out('route').values('code')
==>BKK
==>SVO
==>HND
...
With one connection, duplicate destinations produce 11,386 paths. dedup() reduces these to 1,817 unique airports:
gremlin> g.V().hasLabel('airport').has('code','ICN').out('route').out('route').dedup().count()
==>1817
Exclude destinations already reachable nonstop:
gremlin> g.V().hasLabel('airport').has('code','ICN').out('route').aggregate('nonstop').out('route').where(without('nonstop')).dedup().count()
==>1673
Find two-leg routes from Incheon to Gimpo:
gremlin> g.V().hasLabel('airport').has('code','ICN').out('route').out('route').has('code','GMP').path().by('code')
==>[ICN,HND,GMP]
==>[ICN,KIX,GMP]
==>[ICN,NGO,GMP]
==>[ICN,CJU,GMP]
==>[ICN,PEK,GMP]
Count routes from Korean airports to Japan by origin:
gremlin> g.V().hasLabel('airport').has('country','KR').as('kr').out('route').has('country','JP').select('kr').groupCount().by('code')
==>[ICN:27,TAE:5,GMP:3,CJU:4,PUS:6]
Find the longest route by traversing edges and ordering their dist property:
gremlin> g.V().hasLabel('airport').has('country','KR').outE('route').order().by('dist',desc).inV().has('country','JP').path().by('code').by('dist')
==>[ICN,882,CTS]
Strengths and challenges
Graph databases make connected data easy to traverse, but performance still declines when a traversal touches huge numbers of nodes or edges—for example, millions of followers of a celebrity. Design the graph around common queries, write efficient traversals, and keep data in other database systems when it does not benefit from graph storage.