Image by Gerd Altmann from Pixabay

Image by Gerd Altmann from Pixabay

Graph database Neo4j discovers fake reviews on Amazon

Digital Detective

Article from ADMIN 58/2020
By
A Neo4j graph database example shows how to uncover fraudulent reviews on Amazon.

Graph databases do not use the relational tables and join commands of traditional relational databases. Instead, they look for relations between nodes and support queries that would be slow or even impossible to process in their relational counterparts. In this article, I take advantage of the graph database structure to create an algorithm that detects fake product reviews on Amazon with a Neo4j instance in a Docker container.

Fraudulent Reviews

On closer inspection of a product on Amazon that has consistently earned five-star ratings, it often turns out that many of the reviewers are professional lackeys. The text obviously betrays that the author did not even use the product (Great product, fast delivery! ). If you then search for further reviews from the same customer, you will often find other five-star reviews that look very similar. The problem is so evident on Amazon that customers rub their eyes in amazement wondering why the online giant doesn't intervene.

Graph databases can help identify such shenanigans. Several criteria can help detect patterns in the typical behavior of fraudsters and expose them. Does a single customer write hundreds of five-star ratings? Suspicious. Does a product have many of these boilerplate reviews? There could be something wrong with that. Do the members of a gang of fraudsters all review the same products?

If the alarm bells go off for only one of these criteria, you might not necessarily suspect misuse, but two or more increases the likelihood of fraud. Further investigation would then be worthwhile to see whether the intent is to rip off customers.

Detection Algorithm

The last of the previously mentioned criteria seems interesting from a programming point of view. How does an algorithm find groups of users who all rate the same products without having any clues as to which users they are?

Listing 1 shows a fictitious YAML list of products with the names of evaluators. A similar list could be obtained with real data from the Amazon website with the official API or a scraper.

Listing 1

reviews.yaml

reviews:
  product1:
    - reviewer1
    - reviewer2
    - reviewer3
    - reviewer7
  product2:
    - reviewer1
    - reviewer2
    - reviewer4
    - reviewer8
  product3:
    - reviewer3
  product4:
    - reviewer4
    - reviewer7
  product5:
    - reviewer5
    - reviewer8
  product6:
    - reviewer6

The human eye immediately recognizes that a dubious duo consisting of reviewer1 and reviewer2 obviously reviewed the products product1 and product2 together. If the data were only available in a relational data model, it would be very time consuming to discover this connection in a very large database in something less than an infinite amount of time.

With graph databases that simply traverse along the relations between nodes instead of juggling relational tables and computationally expensive join commands, it is relatively easy to program smart algorithms. I discovered graph databases six years ago and featured them in an

article [1]; however, the development of the genre has not stood still, which calls for a new look.

Prettified

The Go program presented in this issue converts the YAML list from Listing 1 into a graph that shows which products were evaluated by which persons.

To do this, it sends commands to a locally installed Neo4j database, which, when the program has run, displays the graph shown in Figure 1 with the relations between products and reviewers. The screenshot is taken from the window of a web browser, which uses http://localhost:7474 to point to a Neo4j installation that conveniently provides not only the server in a container, but also a web interface for graphically enhancing the data.

Figure 1: The Neo4j relation graph is accessible in the browser at http://localhost:7474.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy ADMIN Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

comments powered by Disqus