It is frequently desirable to represent bidirectional relationships in web applications, and simple databases are not immediately suitable for this information.

Let’s start with a simple unidirectional example that a database like MySQL handles well.

User 1 wants to be friends with User 2.

A row is inserted into the befriendings table with User 1 as the initiator and User 2 as the recipient.

Now, if your friendships are unidirectional the job is pretty much done. If User 2 wants to be friends with User 1, they can initiate a befriending of their own. On Kongregate, users you have befriended are called “Friends”, and users who have befriended you are called “Fans”

 

However, sometimes you want to follow a friendship model more like the one used by Facebook or MySpace, where all friendships are bidirectional.

Both of these sites implement a relationship-confirmation step before any relationship is finalized. For simplicity’s sake, I’ll skip over modeling that step, but suffice it to say, pending relationships are best stored separately from confirmed relationships. Attempting to mix the two needlessly mingles data and complicates application logic.

Let’s assume any user has the ability to initiate the bidirectional relationship. Now, in the befriendings table, we technically have all the information necessary to list all friends of User 1, and all friends of User 2.

 

 

In the first case, only one index can be used (on either initiatior_id, or on recipient_id), and in both cases, the ORDER clause necessitates a filesort on the results.

So the simplest approach here would be to simply duplicate the relationship data in another row, swapping the initiator and recipient ids. This is okay, but starts to fall apart as store more information about the relationship itself. For example, imagine we store a compatibility score between the two users, as last.fm does.

This information would need to be duplicated, as well as keeping the the various timestamps in-sync. Indices against these fields would bloat unnecessarily.

Instead, we can move the duplicated information to a dedicated table, eliminating indices on the relationships table itself. The column by which results are ordered is moved from the relationships table to the dedicated table.

 

Users then access their list of friends through a :friends association and can access the rich information about the relationship itself through the :befriendings association. The befriending_edges table’s indices are simple and very efficient, and looking up befriendings uses a single primary key, and looking up friends uses two primary keys.

 

5 Comments

  1. Cameron

    So you can have a belongs_to model that goes through another belongs_to model?
    I am kinda confused as a new rails guy. Could you explain a bit more about these edges please?

  2. duncanbeevers

    I think it’s simplest to think of this solution in terms of the associations on the User model, and the kinds of questions we want to ask about Users. For example, one question is “Who are my friends?” Another question is “When did we become friends?”

    In a system with strictly bi-directional befriending (if User A is friends with User B then User B is necessarily friends with User A) asking “Who are my friends?” can be an expensive operation for the reasons outlined above.

    The BefriendingEdges are never addressed directly. That is, we never (or rarely) ask the question “What befriendings have I initiated?” or “What befriendings am I a participant in but was not the initiator?” However, despite the fact that we never ask these questions directly, modeling the graph edges separately allows for faster answers to the other more general questions.

    All queries from the User through to either another User or to a Befriending use the BefriendingEdge as a through model. It’s not a belongs_to going through another belongs_to. Rather, it’s two has_manys throughs that utilize a common has_many.

  3. Ben

    Your modeling is really interesting, but I’m not sure to understand correctly. Could you indicate the different tables involved and the fields in those tables, I’m especially confused about the befriending_edges table. Thanks a lot!

  4. duncanbeevers

    The proposed table structure looks like this.

    Users
    id

    Befriendings
    initiator_id
    recipient_id
    Unique index on initiator_id, recipient_id

    BefriendingEdges
    user_id
    befriending_id
    Unique index on user_id, befriending_id

    All graph information is accessed through associations on User. For example, accessing information about the relationship between User 1 and User 2 might look like this.

     

  5. Maarten

    Hi

    Are you sure your friends association works?

     

    The :through just follows the user association on the befriending_edge, pointing back to the user itself, no?

Leave a Comment

Enclose code in <code lang="ruby"></code> if you care.
Preview your comment using the button below.