Graph database Neo4j discovers fake reviews on Amazon

Digital Detective

YAML in the Go Universe

Listing 3 uses Unmarshal() (line 24) from the official YAML module in the Go universe to transpose the YAML data as a byte array after reading with io/ioutil (line 8) into a Go data structure.

The strict Go typing plays along with the fairly casual YAML here in a fairly offhand way by defining a string-indexed hash table with entries consisting of arrays of strings. The Config type structure starting in line 12 defines the hash map with the nested string arrays in the Reviews entry. Capitalization is important here so that the YAML module can access it.

Starting in line 35, two for loops iterate over all products in the hash map and then over the array of reviewers for each entry. Before line 50 compiles the command for adding the relation, the if conditions in lines 38 and 44 check whether the two endpoints of the relation already exist as nodes in the database.

If the created map variable indicates that a node is still missing, the code adds to the cmd string a command that creates the node with a MERGE instruction. It terminates all commands with line breaks. In this case, it is important not to send semicolon-separated Neo4j commands, because it will cause problems if some of them define variables (e.g., reviewer1) that are reused later (when the relation is created). A semicolon terminates a command (line 56), and Neo4j then forgets all variables defined previously.

Contacting the Server

The toNeo4j() function contacts the browser port of the server in the container starting in line 60 assembles. It transmits the command string cmd, which it has assembled from the map data, and preceeds the instructions with a command to first delete all previously existing data.

The open source package used here, cq from GitHub, is a bit outdated. Although it does not use the API module's Bolt connection supported by Neo4j on port 7687, it works fine. It's also easier to install than the default, which forces you to download some obscure Bolt binaries.

In typical SQL style, line 61 contacts the server in the Docker container. Line 68 uses Exec() to send the command present in cmd over the port, which the server acknowledges with an error message if something went wrong.

With the command sequence

$ go mod init rimport
$ go build

Go fetches the libraries needed to create the binary from GitHub and creates an executable program named rimport. When called, the executable first reads the reviews.yaml file from disk and then pumps the necessary commands into the container port to the Neo4j server. The user can then send queries to the data model for fraud detection, as shown in Figure 2.

Installation Troubles

The current Docker image neo4j:latest drags in the latest Neo4j version 4.0.3, which does not yet support any graph algorithms. To install it, you have to download a .jar file from the Neo4j site [4] and dump it into the ~/neo4j/plugins/ directory. There, the Docker container will grab it when the Neo4j server is started, because the docker run command in Figure 3 imports the plugin directory with the -v option.

Hold on, not so fast: The Graph algorithm's plugin is only available as version 3.5.9. If you think you can simply use it with a Neo4j database of version 4.0.3, think again. Right after restart, you'll see the container quickly giving up the ghost with a long, but completely meaningless, stack trace. If you install neo4j:3.5.9 instead of neo4j:latest , you will have more luck. The server starts up properly, and the database query for algorithms in the algo.* namespace reveals a long list (Figure 4).

Figure 4: After installing the graph plugins, Neo4j shows the retroactively loaded algorithms.

Unfortunately, you will encounter more obstacles. When you try to use one of the algorithms, an error message on the screen explains that this is not possible in a "sandbox" for safety reasons. Instead, you will need to exempt the imported algorithms from the routinely imposed restrictions. To do this, the environment variable NEO4J_dbms_security_procedures_unrestricted is set to a regular expression to specify that everything below the namespace algo enjoys free rein.

The Docker command in Figure 3 already defines the variable correctly. It also sets the NEO4J_AUTH variable to neo4j/test, which tells the server to omit the otherwise mandatory password reset. Let the fun begin!

Infos

  1. "Saving and Evaluating Network Paths in Neo4j" by Mike Schilli, Linux Magazine , issue 164, June 2014, pg. 66, https://www.linuxpromagazine.com/Issues/2014/164/Perl-Neo4j/
  2. Node similarity algorithm: https://neo4j.com/docs/graph-algorithms/current/algorithms/node-similarity/
  3. Jaccard index: https://en.wikipedia.org/wiki/Jaccard_index
  4. Retroactively installing the Algo plugin for Neo4j: https://neo4j.com/docs/graph-algorithms/current/introduction/#_installation
  5. Listings for this article: ftp://ftp.linux-magazine.com/pub/listings/admin-magazine.com/58/

The Author

Mike Schilli works as a software engineer in the San Francisco Bay area, California. He writes a monthly column for Linux Magazine, in which he researches practical applications of various programming languages. If you go to mailto:mschilli@perlmeister.com he will gladly answer any questions.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy ADMIN Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

comments powered by Disqus