This page introduces the cluster queries.

Currently, Graql supports two algorithms for identifying clusters in the graph.

Connected Component

The connected components algorithm can be used to find clusters of instances that are connected. The algorithm finds all instances (relationships/resources/entities) that are connected via relationships in the knowledge graph and gives each cluster a unique label. In the knowledge graph below you can see three connected components that correspond to people who are related through marriage. In this knowledge graph three unique labels will be created one corresponding to each of the sets of connected instances.

Three connected components representing groups of friends.

You can call the connected component algorithm to find the clusters above using:

compute cluster in [person, marriage], using connected-component;

The results you get involve 3 clusters with sizes: 3, 5, 3. If you only wanna find out which cluster contains the given entity, you can use the modifier contains._

compute cluster in [person, marriage], using connected-component, where [contains="V123"];

Here, assuming V123 is the id of John Newman in the example above, the cluster query will return the cluster in the middle, which contains John Newman.

K-Core

K-Core can also be used to find clusters of instances that are tightly interlinked within a network.

Similar to Connected Component, we can compute the clusters using the following:

compute cluster in [person, marriage], using k-core;

By default k = 2. Of course we can set the value of k (k > 2):

compute cluster in [person, marriage], using k-core, where k=10;
Tags: analytics