Jerry Nixon @Work: Triple Edge Constraints in Graph DBs

Jerry Nixon on Windows

Monday, January 31, 2022

Triple Edge Constraints in Graph DBs

What is an Edge? Good question. An Edge defines the relationship between two Nodes. A Node is a thing, noun, or concept – like Store. In my diagram below, the Wants Edge in between the Store and Product Nodes. You would write it as Store–(Wants)->Product and it means to define which Store Wants which Product.

The puzzle here is the Must_Use Edge. In my diagram below, this Edge is between Store and Warehouse and it means to define which Warehouse a Store can use when ordering certain Products. This is special because it references three Nodes: Store, Warehouse and Product. To make this work, the Must_Use Edge must include a field referencing the relevant Product Id.

Adding this value is easy, but it means the Edge rows multiply by the number of Products in the catalog – and with some businesses that can mean hundreds of thousands of products. It also introduces a kind of inferred relationship that doesn’t have a standard representation when drawing a Graph. In the diagram below, I indicate the special relationship in red.

image

How to maintain high performance with this pattern?

This is a worthwhile question because with so many more rows there is a potential impact. Here are the three things you can do to help:

  1. Add a Unique Clustered Index on Edge($from_id, $to_id)
  2. Add an Unclustered Index on Edge(ProductId)
  3. Add an Unclustered Index on Node(ProductId)

I want to say one quick thing about #1 in the list above, that is to add a Unique Clustered Index on the To/From columns of the Edge. Why do that? Developers might intuit that SQL Graph automatically adds this index on every Edge but it does not. It is an important Index to add to every single Edge, not just Edges in this unique scenario. Good tip!

---

CODE SAMPLE. Here’s the Gist.

---

Scenario

  1. Every Store (wants) every Product
  2. Every Warehouse (Has) every Product
  3. Stores (Must_Use) certain Warehouses for certain Products.

Problem

  1. A graph Edge only references two Nodes: from & to
  2. How can we constrain an Edge to a third Node (Product)

Solution

  1. Add ProductId to Product Node
  2. Add ProductId to (Must_Use) Edge

Considerations

What if there are thousands of Stores, Hundreds of Warehouses, and Hundreds of Thousands of Products? The number of (Must_Use) Edge rows would be like # of Stores x # of Warehouses x # of Products. This could make the edge have millions of rows. With performance as a consideration, is this the only way? Answer: Yes, I think it is.

image