« Previous 1 2 3
Graph database Neo4j discovers fake reviews on Amazon
Digital Detective
YAML in the Go Universe
Listing 3 uses Unmarshal()
(line 24) from the official YAML module in the Go universe to transpose the YAML data as a byte array after reading with io/ioutil
(line 8) into a Go data structure.
The strict Go typing plays along with the fairly casual YAML here in a fairly offhand way by defining a string-indexed hash table with entries consisting of arrays of strings. The Config
type structure starting in line 12 defines the hash map with the nested string arrays in the Reviews
entry. Capitalization is important here so that the YAML module can access it.
Starting in line 35, two for
loops iterate over all products in the hash map and then over the array of reviewers for each entry. Before line 50 compiles the command for adding the relation, the if
conditions in lines 38 and 44 check whether the two endpoints of the relation already exist as nodes in the database.
If the created
map variable indicates that a node is still missing, the code adds to the cmd
string a command that creates the node with a MERGE instruction. It terminates all commands with line breaks. In this case, it is important not to send semicolon-separated Neo4j commands, because it will cause problems if some of them define variables (e.g., reviewer1
) that are reused later (when the relation is created). A semicolon terminates a command (line 56), and Neo4j then forgets all variables defined previously.
Contacting the Server
The toNeo4j()
function contacts the browser port of the server in the container starting in line 60 assembles. It transmits the command string cmd
, which it has assembled from the map data, and preceeds the instructions with a command to first delete all previously existing data.
The open source package used here, cq
from GitHub, is a bit outdated. Although it does not use the API module's Bolt connection supported by Neo4j on port 7687, it works fine. It's also easier to install than the default, which forces you to download some obscure Bolt binaries.
In typical SQL style, line 61 contacts the server in the Docker container. Line 68 uses Exec()
to send the command present in cmd
over the port, which the server acknowledges with an error message if something went wrong.
With the command sequence
$ go mod init rimport $ go build
Go fetches the libraries needed to create the binary from GitHub and creates an executable program named rimport
. When called, the executable first reads the reviews.yaml
file from disk and then pumps the necessary commands into the container port to the Neo4j server. The user can then send queries to the data model for fraud detection, as shown in Figure 2.
Installation Troubles
The current Docker image neo4j:latest
drags in the latest Neo4j version 4.0.3, which does not yet support any graph algorithms. To install it, you have to download a .jar
file from the Neo4j site [4] and dump it into the ~/neo4j/plugins/
directory. There, the Docker container will grab it when the Neo4j server is started, because the docker run
command in Figure 3 imports the plugin directory with the -v
option.
Hold on, not so fast: The Graph algorithm's plugin is only available as version 3.5.9. If you think you can simply use it with a Neo4j database of version 4.0.3, think again. Right after restart, you'll see the container quickly giving up the ghost with a long, but completely meaningless, stack trace. If you install neo4j:3.5.9
instead of neo4j:latest
, you will have more luck. The server starts up properly, and the database query for algorithms in the algo.*
namespace reveals a long list (Figure 4).
Unfortunately, you will encounter more obstacles. When you try to use one of the algorithms, an error message on the screen explains that this is not possible in a "sandbox" for safety reasons. Instead, you will need to exempt the imported algorithms from the routinely imposed restrictions. To do this, the environment variable NEO4J_dbms_security_procedures_unrestricted
is set to a regular expression to specify that everything below the namespace algo
enjoys free rein.
The Docker command in Figure 3 already defines the variable correctly. It also sets the NEO4J_AUTH
variable to neo4j/test
, which tells the server to omit the otherwise mandatory password reset. Let the fun begin!
Infos
- "Saving and Evaluating Network Paths in Neo4j" by Mike Schilli, Linux Magazine , issue 164, June 2014, pg. 66, https://www.linuxpromagazine.com/Issues/2014/164/Perl-Neo4j/
- Node similarity algorithm: https://neo4j.com/docs/graph-algorithms/current/algorithms/node-similarity/
- Jaccard index: https://en.wikipedia.org/wiki/Jaccard_index
- Retroactively installing the Algo plugin for Neo4j: https://neo4j.com/docs/graph-algorithms/current/introduction/#_installation
- Listings for this article: ftp://ftp.linux-magazine.com/pub/listings/admin-magazine.com/58/
« Previous 1 2 3
Buy this article as PDF
(incl. VAT)