Lead Image © Victoria, Fotolia.com

Lead Image © Victoria, Fotolia.com

OrientDB document and graph database

Linked Worlds

Article from ADMIN 28/2015
By
OrientDB is a NoSQL document and graph database with a flexible data model and an elegant approach to querying.

Relational databases have been popular for many years, but they force users to squeeze their data models into table designs, which has increasingly proved to be too rigid. For example, tables do not have an elegant solution for saving the relationships of objects one below another. The document and graph data structures found in NoSQL (Not only SQL) databases – of which OrientDB is a representative – have solved some of the storage and retrieval problems of relational databases.

In this article, I demonstrate a few of the OrientDB features that cannot be implemented with classical relational databases. Figure 1 shows a comparison: Relational databases keep all data in tables, with a column for each attribute. The tables impose a rigid schema at run time, and additional attributes either require the existing tables to be adapted or an extra table to be defined. Both solutions involve intervention by the database administrator, and possibly even migration of the entire database.

Figure 1: Three kinds of databases.

With document databases, the data for each object is available in a document (XML, JSON, etc.). Each document has a unique ID that the database uses to access the record. Object links refer to other documents by ID. Additional attributes can easily be added to the document. This structure makes the document database more flexible than the relational database. One disadvantage, however, is that the database always has to download documents during the search or track object relationships.

Graph databases are based on two basic objects: The node and the connecting edges. Both can store any number of attributes; a special declaration is not usually necessary. This design results in a flexible data model that enables quick tree traversal via the objects.

Typical applications include all kinds of social networks (who with whom, when, and where). The storing of groups, documents, or project structures also benefits from flexible mapping of complex dependencies and cross references.

OrientDB is a document database overlain by a graph database. The document database provides the advantages of one-direction link relationships, key/value pairs, and object-oriented models. The graph database adds vertex and bi-directional edge relationships and speed benefits.

Installation

The software is available as a precompiled tarball from the website [1]. After downloading, unpack the archive at any point. The server can be started without further configuration using the script bin/server.sh to get started. When first launched, the database just prompts for a password for the database root user.

Although this completes the installation, the database administrator (DBA) will want to configure the database via SSL for a production installation, because the data and passwords are otherwise distributed over the network in plain text. A web application and the somewhat terse, but powerful, console are available for direct use. You can start the console and then log in to the database with:

bin/console.sh
connect remote:localhost root <Password>

Some NoSQL databases dissociate themselves from the query language of relational databases, but OrientDB uses SQL wherever possible. Users thus do not need to learn a new language; new commands are only needed for unique features. In the console, the help command provides an overview of available commands; more extensive documentation is available in the OrientDB wiki [2].

Getting Started

The first example is based on characters and books from the Discworld series (Figure 2) written by the brilliant Terry Pratchett, who sadly passed away in early 2015. It supports two types of node, Person and Book, which are connected via the two edges (connections) Relation and Appear. Nodes and edges have different attributes that can be used directly or as a list or map.

Figure 2: This example database details the relationships between the characters of the Discworld series.

Listing 1 contains an excerpt of the console commands needed by the user to create the database and relationships shown in Figure 2. The user first connects the console with the server in the first two lines and then creates a new database called discworld. After creating this, the console automatically connects to the new database.

Listing 1

Create Database (Excerpt)

01 connect remote:localhost root <password>
02 create database remote:localhost/discworld root <password> plocal
03
04 create class Person extends V;
05 create property Person.birthday date;
06
07 create class Book extends V;
08 create property Book.translation embeddedmap;
09
10 create class Relation extends E;
11 create property Relation.from date;
12
13 create class Appear extends E;
14 create property Appear.chapter embeddedlist integer
15
16 insert into Person (last, first, birthday) values ('Vimes', 'Samuel', '1962-04-03');
17 insert into Person (last, first, birthday) values ('Ramkin', 'Sybil', '1969-09-06');
18
19 insert into Book (name, translation) values ('Guards! Guards!', {'de':'Wachen! Wachen!', 'fr' : 'Au Guet!'} );
20
21 create edge Relation from #11:0 to #11:1 set type='Married', from='1993-01-01';
22
23 create edge Appear from #11:0 to #13:0 set chapter={1,2,3,4,5,6};

Graph databases have base types V (node) and E (edge). Lines 4-8 create the classes Person and Book as an extension of the base node and defines their attributes.

The usual primitives such as integer, string, and date are available in OrientDB; lists, sets, and maps can also be as you can see from line 8 on. The translation attribute in the book class contains the map <Language> : <Title>, and the chapter attribute records the chapter in which a character appears. Relational databases would need to define an additional table for this purpose.

The edge classes Relation and Appear are similarly defined as node classes, the difference being that the base class is now E instead of V. Unlike relational databases, you do not need to define edge relationships as 1:1, 1:n , or m :n . Each edge represents a 1:1 relationship between two nodes. However, any number of edges can originate from a node or point to a node.

Node generation relies on the classic insert statement, for which the target class and the attribute values need to be specified. Not specifying the attributes last and first when defining Person does not yield an error message; instead, the database dynamically creates these attributes when inserted.

To populate lists, sets, or maps, OrientDB accepts a JSON-like notation (line 18). OrientDB assigns a unique object ID for each node (e.g., #11:3 ) which can then be used later to find or create the edges. Edges must be created using the new create edge statement. In addition to the IDs of the start and target objects, edges – like the nodes – can have arbitrary attributes.

Question Time

Once the data model has been defined and the first data added, it is time to start searching in the database. The normal select statement is ideal for querying edges and nodes directly; the user can use the typical comparison operators (e.g., =, >, like), which also works with embedded lists, sets, and maps.

Line 3 in Listing 2 finds the book. The additional value of OrientDB over a relational database is the elegant evaluation of object relationships across one or more edges. The out and in attributes for each object at the nodes and edges are defined automatically to store the object relationships, as you can see from the output in Figure 3.

Listing 2

Example Selects

01 select from Person where last='Vimes';
02 select from Person where birthday > '1965-01-01';
03 select from Book where translation ["de"] ="Ab die Post";
04 select expand( both() ) from Person where first='Sam';
05 select expand( in().out('Appear') ) from #11:2;
06 traverse any() from #11:0;
07 traverse out('Relation') from #11:1 while $depth < 10;
08 traverse any() from 11:0 while
09    ( @class='Relation' and from < '2000-01-01')
10    or from is null;
11 select name from (traverse any() from #11:0) where @class='Residence';
Figure 3: OrientDB automatically defines directional properties.

If you use the out() function in the select statement, you receive all edges emanating from the node; accordingly, you get the edges pointing to a node with the in() function and edges in both directions with the both() function. In combination with the expand() function, you will receive the nodes connected to the found edges.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy ADMIN Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

comments powered by Disqus