OrientDB document and graph database
Linked Worlds
Relational databases have been popular for many years, but they force users to squeeze their data models into table designs, which has increasingly proved to be too rigid. For example, tables do not have an elegant solution for saving the relationships of objects one below another. The document and graph data structures found in NoSQL (Not only SQL) databases – of which OrientDB is a representative – have solved some of the storage and retrieval problems of relational databases.
In this article, I demonstrate a few of the OrientDB features that cannot be implemented with classical relational databases. Figure 1 shows a comparison: Relational databases keep all data in tables, with a column for each attribute. The tables impose a rigid schema at run time, and additional attributes either require the existing tables to be adapted or an extra table to be defined. Both solutions involve intervention by the database administrator, and possibly even migration of the entire database.
With document databases, the data for each object is available in a document (XML, JSON, etc.). Each document has a unique ID that the database uses to access the record. Object links refer to other documents by ID. Additional attributes can easily be added to the document. This structure makes the document database more flexible than the relational database. One disadvantage, however, is that the database always has to download documents during the search or track object relationships.
Graph databases are based on two basic objects: The node and the connecting edges. Both can store any number of attributes; a special declaration is not usually necessary. This design results in a flexible data model that enables quick tree traversal via the objects.
Typical applications include all kinds of social networks (who with whom, when, and where). The storing of groups, documents, or project structures also benefits from flexible mapping of complex dependencies and cross references.
OrientDB is a document database overlain by a graph database. The document database provides the advantages of one-direction link relationships, key/value pairs, and object-oriented models. The graph database adds vertex and bi-directional edge relationships and speed benefits.
Installation
The software is available as a precompiled tarball from the website [1]. After downloading, unpack the archive at any point. The server can be started without further configuration using the script bin/server.sh
to get started. When first launched, the database just prompts for a password for the database root user.
Although this completes the installation, the database administrator (DBA) will want to configure the database via SSL for a production installation, because the data and passwords are otherwise distributed over the network in plain text. A web application and the somewhat terse, but powerful, console are available for direct use. You can start the console and then log in to the database with:
bin/console.sh connect remote:localhost root <Password>
Some NoSQL databases dissociate themselves from the query language of relational databases, but OrientDB uses SQL wherever possible. Users thus do not need to learn a new language; new commands are only needed for unique features. In the console, the help
command provides an overview of available commands; more extensive documentation is available in the OrientDB wiki [2].
Getting Started
The first example is based on characters and books from the Discworld series (Figure 2) written by the brilliant Terry Pratchett, who sadly passed away in early 2015. It supports two types of node, Person
and Book
, which are connected via the two edges (connections) Relation
and Appear
. Nodes and edges have different attributes that can be used directly or as a list or map.
Listing 1 contains an excerpt of the console commands needed by the user to create the database and relationships shown in Figure 2. The user first connects the console with the server in the first two lines and then creates a new database called discworld
. After creating this, the console automatically connects to the new database.
Listing 1
Create Database (Excerpt)
01 connect remote:localhost root <password> 02 create database remote:localhost/discworld root <password> plocal 03 04 create class Person extends V; 05 create property Person.birthday date; 06 07 create class Book extends V; 08 create property Book.translation embeddedmap; 09 10 create class Relation extends E; 11 create property Relation.from date; 12 13 create class Appear extends E; 14 create property Appear.chapter embeddedlist integer 15 16 insert into Person (last, first, birthday) values ('Vimes', 'Samuel', '1962-04-03'); 17 insert into Person (last, first, birthday) values ('Ramkin', 'Sybil', '1969-09-06'); 18 19 insert into Book (name, translation) values ('Guards! Guards!', {'de':'Wachen! Wachen!', 'fr' : 'Au Guet!'} ); 20 21 create edge Relation from #11:0 to #11:1 set type='Married', from='1993-01-01'; 22 23 create edge Appear from #11:0 to #13:0 set chapter={1,2,3,4,5,6};
Graph databases have base types V (node) and E (edge). Lines 4-8 create
the classes Person
and Book
as an extension of the base node and defines their attributes.
The usual primitives such as integer, string, and date are available in OrientDB; lists, sets, and maps can also be as you can see from line 8 on. The translation
attribute in the book class contains the map <Language> : <Title>
, and the chapter
attribute records the chapter in which a character appears. Relational databases would need to define an additional table for this purpose.
The edge classes Relation
and Appear
are similarly defined as node classes, the difference being that the base class is now E
instead of V
. Unlike relational databases, you do not need to define edge relationships as 1:1, 1:n
, or m
:n
. Each edge represents a 1:1 relationship between two nodes. However, any number of edges can originate from a node or point to a node.
Node generation relies on the classic insert
statement, for which the target class and the attribute values
need to be specified. Not specifying the attributes last
and first
when defining Person
does not yield an error message; instead, the database dynamically creates these attributes when inserted.
To populate lists, sets, or maps, OrientDB accepts a JSON-like notation (line 18). OrientDB assigns a unique object ID for each node (e.g., #11:3
) which can then be used later to find or create the edges. Edges must be created using the new create edge
statement. In addition to the IDs of the start and target objects, edges – like the nodes – can have arbitrary attributes.
Question Time
Once the data model has been defined and the first data added, it is time to start searching in the database. The normal select
statement is ideal for querying edges and nodes directly; the user can use the typical comparison operators (e.g., =
, >
, like
), which also works with embedded lists, sets, and maps.
Line 3 in Listing 2 finds the book. The additional value of OrientDB over a relational database is the elegant evaluation of object relationships across one or more edges. The out
and in
attributes for each object at the nodes and edges are defined automatically to store the object relationships, as you can see from the output in Figure 3.
Listing 2
Example Selects
01 select from Person where last='Vimes'; 02 select from Person where birthday > '1965-01-01'; 03 select from Book where translation ["de"] ="Ab die Post"; 04 select expand( both() ) from Person where first='Sam'; 05 select expand( in().out('Appear') ) from #11:2; 06 traverse any() from #11:0; 07 traverse out('Relation') from #11:1 while $depth < 10; 08 traverse any() from 11:0 while 09 ( @class='Relation' and from < '2000-01-01') 10 or from is null; 11 select name from (traverse any() from #11:0) where @class='Residence';
If you use the out()
function in the select
statement, you receive all edges emanating from the node; accordingly, you get the edges pointing to a node with the in()
function and edges in both directions with the both()
function. In combination with the expand()
function, you will receive the nodes connected to the found edges.
Buy this article as PDF
(incl. VAT)