Blog Archives

Using CanCan to implement an “Agree to Terms” workflow

Posted in Code, Inside TFG, Ruby on Rails

Terms & Conditions
Recently in a Rails application I was tasked with adding in a basic “terms and conditions” page.

There was nothing special about the feature, but I was really happy with my solution so I decided to write about it briefly.

Thinking about a solution

So, my initial plan was:

  1. Add a agreed_to_term_and_conditions_at datetime field to the User model. Use a datetime here so that if we change the conditions later we can check against the time.
  2. Perform checks in the app to prevent users that haven’t agreed to the terms and conditions will be redirected to the T&C workflow

Step 1 was straight forward, but step 2 got me thinking.

Initially I had considered implementing a before_filter in ApplicationController that would check whether the User had agreed to the T&Cs and redirect them to the T&Cs page if they hadn’t.

After thinking for a moment I decided that it was really a question of authorization, and as a result should be managed by an Ability file.

The reasoning I used is that I would say a user should not be able to access the site until they had agreed to the terms. That sounds suspiciously like a cannot statement in CanCanCan.

Implementing a solution with CanCanCan

Once I had decided to use CanCanCan to implement the solution, it was just a matter of getting all the parts together.

Firstly, I had my abilities split into separate files in the way I have suggested in this post. I had an Ability::Factory that would take a User (or nil) and return the appropriate ability file. It looks something like:

class Ability::Factory

  def self.build_ability_for(user)
    return Ability::Anonymous.new if user.nil?

    case user.role
    when :admin
      Ability::Admin.new(user)
    when :supervisor
      Ability::Supervisor.new(user)
    when :doctor
      Ability::Doctor.new(user)
    when :patient
      Ability::Patient.new(user)
    else
      raise(Ability::UnknownRoleError, "Unknown role passed through: #{user.role}")
    end
  end

end

My initial idea was to do some checks for each role and basically say something like:

if user.has_agreed_to_terms_and_conditions?
  # Implement abilities as per usual
else
  cannot :manage, :all
end

But that would lead to a lot of duplication in both implementation and tests. Plus, just a lot of code in general, which I despise.

Thinking further, I decided that a User who hadn’t agreed to the terms and conditions had a set of abilities of their own, independently to their role. I created a new Ability for such a condition: Ability::PendingAgreementToTermsAndConditions. The class was implemented like:

class Ability::PendingAgreementToTermsAndConditions < Ability

  def initialize(user)
    cannot :manage, :all
    can :agree_to_terms_and_conditions, User, id: user.id
  end

end

I amended my Ability::Factory so that it would return the pending ability in the right conditions:

class Ability::Factory

  def self.build_ability_for(user)
    return Ability::Anonymous.new if user.nil?

    if user.has_agreed_to_terms_and_conditions?
      ability_class_for(user.role).new(user)
    else
      Ability::PendingAgreementToTermsAndConditions.new(user)
    end
  end

private

  def ability_class_for(role)
    case role
    when :admin
      Ability::Admin
    when :supervisor
      Ability::Supervisor
    when :doctor
      Ability::Doctor
    when :patient
      Ability::Patient
    else
      raise(Ability::UnknownRoleError, "Unknown role passed through: #{user.role}")
    end
  end

end

Great, so now I had all the abilities I needed. It was time to incorporate the logic into my controller so the application would handle users who hadn’t agreed to the T&Cs.

I had to ensure two things:

  1. Users can’t access other pages in the app that aren’t the T&Cs. When they do, they will be redirected to the T&Cs page.
  2. When a user who hasn’t agreed to the T&Cs signs in, they are redirected to the T&Cs page.

Handling access to pages when terms and conditions aren’t agreed to

Regarding the first objective: Anyone who has used CanCan or CanCanCan will know that since the ability file prohibits users from accessing other pages (cannot :manage, :all), a CanCan::AccessDenied exception will be raised if those pages are hit.

That means that I just had to handle that exception, and redirect the user to the T&Cs page. The CanCanCan README explains how to catch this exception in detail, but I’ll post the code I used anyway:

class ApplicationController < ActionController::Base

  rescue_from CanCan::AccessDenied do |exception|
    if current_user.present?
      # You could also do: current_ability.can?(:agree_to_terms_and_conditions, current_user)
      # but I think the following reads better
      if current_user.has_agreed_to_terms_and_conditions?
        # Redirect as usual
      else
        # Redirect to the terms page
      end
    else
      # Do whatever for unauthed users
    end
  end

end

Moving on, let’s ensure the user isn’t sent straight to another redirect when they sign in.

Redirecting users to the terms and conditions page when they sign in

This is a problem for your authentication system. I use devise, so I was able to override the after_sign_in_path_for method in my ApplicationController as outlined in the documentation. The code looks like:

class ApplicationController < ActionController::Base

  # Override: Devise method
  def after_sign_in_path_for(user)
    # You could also do: current_ability.can?(:agree_to_terms_and_conditions, current_user)
    # but I think the following reads better
    if user.agreed_to_terms_and_conditions_at.present?
      # Redirect as usual
    else
      # Redirect to the terms page
    end
  end

end

Now the user will get one redirect, instead of being redirected to a page they can’t access.

Conclusion

So that’s my solution. About 20 extra lines of code (plus tests) and now you’ve got all the logic for implementing a terms and conditions workflow.

I really enjoyed implementing that solution. It was easy to write and has had no maintenance cost.

Separating abilities in CanCan

Posted in Code, Inside TFG, Ruby on Rails

Users and Authorization
Authorization is simple to implement in Rails thanks to gems like CanCan and its chubby cheeked offspring CanCanCan. When getting started with CanCanCan, the documentation suggests that you use the generator to create a single Ability file to store your abilities in. This is a great starting point, but in my experience few projects using CanCanCan ever evolve past the use of a single Ability file – much to their detriment.

In this post I’ll have a look at an example Ability file and enumerate some flaws in such a system. Following that I’ll discuss a way to improve it by breaking the Ability file out into multiple Ability files.

Defining a real-world example

Let’s imagine a somewhat complicated application that has:

4 different roles, that may have the ability to perform up to 50 different actions.

For some context, let’s say the application stores medical imaging scans (ultrasounds and so forth) and has the following roles:

  1. Patients that sign in and view information about their scans, and can grant access to their scans to doctors.
  2. Doctors that sign in and add notes and attach dictations to these scans, and can look over all their patients’ scans.
  3. Supervisors that sign in and manage the doctors, assigning them to hospitals and medical practices. Assigning patients to doctors.
  4. Admins that sign in and manage all the above user accounts, and perform basic CRUD for hospitals and medical practices.

An Ability file for such an application might look like:

class Ability
  include CanCan::Ability

  def initialize(user)
    # Anonymous users don't have access to anything
    return if user.nil?

    case user.role
    when :admin
      can :manage, :all
    when :supervisor
      # Between 1-50 can/cannot statements
    when :doctor
      # Between 1-50 can/cannot statements
    when :patient
      # Between 1-50 can/cannot statements
    else
      raise(Ability::UnknownRoleError, "Unknown role passed through: #{user.role}")
    end
  end

end

 

The problems with a single Ability file

As you can see, if we have many different abilities per user this file will get quite large.

Let’s say that there are very few shared abilities and that for each of the supervisor, doctor, and patient roles we have 50 lines of ability declarations. That equates to roughly 170 lines of code in the file. Historically, I’ve found that spec to implementation ratio is about 2:1, so let’s imagine there’s a 340 line spec file that corresponds to this implementation.

There are many problems with an Ability file of this size.

  1. Larger files are harder to understand, maintain, and debug. There are just too many different concepts for a developer to deal with in one location, many of which will be irrelevant for whatever their task at hand is.
  2. Spec files becomes even harder to maintain for larger implementations. Since spec:code ratio can bloat in excess of 2:1, it will be even harder to maintain the specs.
  3. The Ability file will have far too many responsibilities, violating the Single Responsibility Principle and suffering from the drawbacks of such behaviour. This boils down to a higher maintenance cost and defect rate. If you don’t like OO sophistry, let me put it another way: The file tries to do too much. The class answers the question “what can every user in the system conceivably do?”. Whereas, we are far more likely to be interested in what just one of those users can do at any given time.
  4. In a similar vein, the Ability class becomes a god class for authorization. I feel like CanCan and CanCanCan only encourage this behaviour by having an opaque default for determining which Ability to use. By default, CanCanCan will assume that you have an Ability class in its current_ability function. There is a section in the GitHub wiki on changing this, though.
  5. Large classes and files suffer from the broken windows theory – that is: since the class is already so massive and bloated, new developers on the project will just pile more functionality on top of the existing mess – despite the cost in readability and maintainability. Further, the scope of the class starts to spread as more and more code is tacked on. You might hear such excuses from developers as “I’m just following the convention of the existing app” or “look, it’s already screwed so it’s not like I can make it any worse”. You may also hear my favourite “we don’t have budget to refactor”. Yeah, if you don’t have budget to refactor imagine how much budget you DON’T have for fixing defects all the time because you keep piling shit on more shit.

Deconstructing the monolithic Ability file

In order to resolve the issues above, we must break the Ability file apart. Usually, my first tactic is to identify the responsibilities of the class and break the code out into classes that represent each of these responsibilities.

Let’s review the responsibilities of the Ability class above:

  1. It defines what an anonymous user can do (EG: handling user.nil?)
  2. It defines what an admin can do
  3. It defines what a supervisor can do
  4. It defines what a doctor can do
  5. It defines what a patient can do
  6. It handles unknown roles

Now we create a class for each of those responsibilities. Which would leave us with classes like:

  1. AnonymousUserAbility – basically a null object for abilities.
  2. AdminAbility
  3. SupervisorAbility
  4. DoctorAbility
  5. PatientAbility, and
  6. UserAbilityFactory (or Ability::Factory if using namespaces), which takes a User (or nil) and returns the corresponding Ability class above. This class also handles roles without defined abilities by raising an exception.

You may also like to keep an Ability file that includes the CanCan::Ability module and contains some shared functions that will be used in the other ability files.

You should store these files in app/abilities. They are not models as defined by MVC, so they don’t belong in app/models, which is where CanCan and CanCanCan stash the Ability file by default.

You may also like to namespace these classes (EG: Ability::AnonymousUser), since namespaces can also improve the organisation of an application.

An example of one of the Ability files is:

class Ability::Patient < Ability

  def initialize(user)
    [:show, :invite_doctor].each do |ability|
      can ability, Result, patient_id: user.id
    end
    # etc
  end

end

Now I can have private methods that are specific to just the abilities of the Patient, rather than private methods being for all the different roles. I have a single function that can tell me the sum total of a Patient’s abilities. I can include some additional documentation in the class explaining the role of the Patient in our application for future developers.

Let’s have a look at the Ability::Factory now. I named this class after the factory pattern since its job is to take a User (or nil) and build us a corresponding Ability file. If you wanted, you could just put the function in Ability. I prefer the new class implementation, which would look like:

class Ability::Factory

  def self.build_ability_for(user)
    return Ability::Anonymous.new if user.nil?

    case user.role
    when :admin
      Ability::Admin.new(user)
    when :supervisor
      Ability::Supervisor.new(user)
    when :doctor
      Ability::Doctor.new(user)
    when :patient
      Ability::Patient.new(user)
    else
      raise(Ability::UnknownRoleError, "Unknown role passed through: #{user.role}")
    end
  end

end

The corresponding controller change to get CanCan or CanCanCan to play nice with your new Abilities would be:

class ApplicationController

  # Override CanCan method to provide custom Ability files
  def current_ability
    @current_ability ||= Ability::Factory.build_ability_for(current_user)
  end

end

Please note: If you are using an additional engine like RailsAdmin or ActiveAdmin, some more work might need to be done in order to get the engine to play nice. You will have to do some spelunking the engine’s codebase to determine how CanCan or CanCanCan is integrated.

Conclusion

Now our large Ability file is broken into smaller, more manageable files. Each file now has a single responsibility and is easier to test. If we need to add a new role it won’t be a nightmare to patch the Ability file. We just build a new file and ensure it is in the Ability::Factory. Luckily, since our factory handles unknown roles by raising an exception, we’ll find out pretty quickly if there’s no corresponding Ability file.

Having a single file per role increases the ease with which we can verify the responsibilities of that role. We can read a single file and determine exactly what the Patient does, for example. Before, it was hidden in the guts of the Ability file.

When it comes to authorization, you want as high a level of visibility as you can on roles, so you don’t have anyone doing things they shouldn’t be able to.

Using FactoryGirl to easily create complex data sets in Rails

Posted in Code, Inside TFG, Ruby on Rails

I use FactoryGirl for setting up data in my application. FactoryGirl gives you all the tools you need to quickly and easily create data for models in your application. Leveraging ffaker you can make realistic looking, randomized data.

Often, you will have complex associations between objects in your system that can be a pain to factory up. I’ve frequently seen people use individual factories to build up these relationships. The amount of work required to set these associations up quickly gets tedious and turns your code into an unreadable mess.

In this article, I will run through some features of FactoryGirl that you can leverage to easily create complex associations.

Transient Attributes

One of the features I use most in FactoryGirl is the transient attributes. Transient attributes allow you to pass in data that isn’t an attribute on the model. I frequently use transient attributes to allow me to use a single FactoryGirl call to create multiple objects.

For example, say you have two models, User and Role. A User has one Role. You might do something like:

role = FactoryGirl.create(:role, name: “Head Buster”)
user = FactoryGirl.create(:user, role: role)

Using transient attributes you could define the following factory:

factory :user do
  transient do
    role_name “admin”
  end

  role do
    Role.find_by(name: role_name) || FactoryGirl.create(:role, name: role_name)
  end
end

which would then allow you to do:

user = FactoryGirl.create(:user, role_name: “Head Buster”)

Traits

Another of my favourite features is traits. You could solve the scenario above using traits by doing something like:

factory :user do
  trait :head_buster do
    role do
      Role.find_by(name: “Head Buster”) || FactoryGirl.create(:role, name: “Head Buster”)
    end
  end
end

which would then allow you to do:

user = FactoryGirl.create(:user, :head_buster)

I’ve found that the power of traits expands exponentially with the complexity of the model they are trying to map. The more states your model can be in, and the more data it has attached to it, the more you’ll be able to use traits to simplify data creation. Try to abstract any state that an object can be in into a trait to simplify usage.

Callbacks

Callbacks in FactoryGirl are also extremely useful. They work hand in hand with transient attributes and traits to allow you perform any non-obvious setup in your factories.

Let’s imagine an app which has the following models:

  • User
  • Book
  • UserReadBook (Join between User and Book, indicating the user has read this book)
  • WishlistBook (Join between User and Book, indicating the user added this book to their Wishlist)

Out of the box, if you wanted to create one of each type of object, you might have some FactoryGirl calls like:

user = FactoryGirl.create(:user)
book = FactoryGirl.create(:book)
FactoryGirl.create(:user_read_book, user: user, book: book)
FactoryGirl.create(:wishlist_book, user: user, book: book)

Let’s say we have a function on User: #related_books, which returns all Books that the User has read or added to their wishlist. Our RSpec tests for such a function might look like:

describe '#related_books' do
  subject(:related_books) { user.related_books }
  let(:user) { FactoryGirl.create(:user) }

  it "includes books this user has read" do
    expect(related_books).to include(FactoryGirl.create(:user_read_book, user: user).book)
  end
  it "includes books this user has added to their wishlist" do
    expect(related_books).to include(FactoryGirl.create(:wishlist_book, user: user).book)
  end
  it "doesn't include books read by other users" do
    expect(related_books).not_to include(FactoryGirl.create(:user_read_book).book)
  end
  it "doesn't include books other users have added to their wishlist" do
    expect(related_books).not_to include(FactoryGirl.create(:wishlist_book).book)
  end
end

Doesn’t look TOO bad. I REALLY don’t like having to tack the .book on the end there. I also don’t like that I’m not directly creating the type of object I want returned in my test. Personally, I think it makes the tests harder to understand. The bigger problem is when we need to refactor.

What happens when requirements change and we have to add in a VideoGame model? Now we change the UserReadBook and WishlistBook models to be polymorphic so they can also hold VideoGames. As a result, we rename the models to UserCompletedItem and WishlistItem.

It’s extremely likely we’ll used the original join table factories in multiple places to test other scopes, searching functions, and more. As a consequence, we have to update all our specs to use the updated join table name. Doesn’t this last step seem like an unnecessary pain in the ass?

What we should have done is used our factories to abstract the concept of wishlisting or reading a Book. Our tests generally want to ensure that there is a specific type of relationship between a Book and a User, but they shouldn’t really need to care about the specifics of it. Let’s look at how factories can help us.

The first thing I do when trying to abstract these concepts is work out the interface I want in my factories. In the case above, I’d like to be able to write:

FactoryGirl.create(:book, read_by: user) # and
FactoryGirl.create(:book, wishlisted_by: user)

I can support this interface using transient attributes and factory callbacks. I can update to my Book factory to look like:

FactoryGirl.define do
  factory :book do
    transient do
      read_by nil
      wishlisted_by nil
      # nil is a sensible default, we don't want our factories creating
      # extra data unnecessarily. It slows your test suite down
    end

    after(:create) do |book, factory|
      if factory.read_by
        FactoryGirl.create(:user_read_book, book: book, user: factory.read_by)
      end
      if factory.wishlisted_by
        FactoryGirl.create(:wishlist_book, book: book, user: factory.wishlisted_by)
      end
    end
  end
end

Here’s what I like about abstracting the concept of reading or wishlisting a Book using factories:

Simpler Tests

Our tests are no longer loaded with implementation details of joining the Book and User. This is especially useful in even more complex relationships. Basically, if my test is checking that a book is returned, I only ever want to create a Book. I don’t want to have to creating multiple other models.

Reduced cost of refactoring

When we have to update the join between Book and User, we only need to update one factory instead of every test that had instantiated one of the renamed join tables.

More concise tests

Although in my example I used a one liner for getting a read or wishlisted Book, in reality the syntax you’d probably see is:

user = FactoryGirl.create(:user)
book = FactoryGirl.create(:book)
FactoryGirl.create(:user_read_book, user: user, book: book)
FactoryGirl.create(:wishlist_book, user: user, book: book)

Which with the factory above could be reduced to:

user = FactoryGirl.create(:user)
book = FactoryGirl.create(:book, read_by: user, wishlisted_by: user)

This may not seem like much, but it can build up. Recently I made a similar refactoring in a spec file that contained 10 such blocks. That amounted to 20 fewer LoC or around 5% fewer LoC in the spec file. Also, had I written my factory that way originally I would had to type a hell of a lot less, too.

Here’s what my specs would look like with my updated factories:

describe '#related_books' do
  subject(:related_books) { user.related_books }

  let(:user)       { FactoryGirl.create(:user) }
  let(:other_user) { FactoryGirl.create(:user) }
  it "includes books this user has read" do
    expect(related_books).to include(FactoryGirl.create(:book, read_by: user))
  end
  it "includes books this user has added to their wishlist" do
    expect(related_books).to include(FactoryGirl.create(:book, wishlisted_by: user))
  end
  it "doesn't include books read by other users" do
    expect(related_books).not_to include(FactoryGirl.create(:book, read_by: other_user))
  end
  it "doesn't include books other users have added to their wishlist" do
    expect(related_books).not_to include(FactoryGirl.create(:book, wishlisted_by: other_user)
  end
end

Wrapping up

So using traits, transient attributes, and callbacks we can make our FactoryGirl factories do a lot more of the heavy lifting for us.

We can also abstract complex associations to reduce the cost of refactoring and increase the readability of our tests.

Although those are my favourite feature, they don’t cover everything FactoryGirl offers. I’d recommend going through the FactoryGirl documentation and thinking about what you can do to get more out of factories in your code.

 

A quick tip with .Net generics

Posted in Code, Inside TFG

Generics constraints in .Net can be recursive. This means that you can use a type in its own generic constraints. Let’s look at an example of where this can be useful.

Let’s say you have some kind of persistent object IEntity. To avoid primitive obsession we are going to create a type safe Reference<T> object to act as a pointer to our entities, rather than just having a property of int called Id.

IReference<TEntity>
where TEntity : IEntity
{
// Actual interface doesn't matter
}

We want a base entity to inherit from, which among other things exposes an IReference<T> to itself.  We can’t be much more specific than returning an IReference<EntityBase>, since we can’t know the subclass type at compile time. Unless we hail to the generic recursion gods.

EntityBase<TSelf> : IEntity
where TSelf : EntityBase<TSelf>
{
IReference<TSelf> Reference { get { ... } };
}

Now we just supply the type when we declare our subclass:

MyEntity : EntityBase<MyEntity>
{
}

You can do much the same thing in Java, but it’s not quite as safe since MyEntity extends EntityBase<OtherEntity> will compile just fine.

As an exercise to the reader; consider the visitor pattern, where we implement a virtual Accept method in order to have compile time type knowledge of this. Can you now write a non virtual Accept method?

A look at Cayley

Posted in Code, Inside TFG

Recently I took the time to check out Cayley, a graph database written in Go that’s been getting some good attention.

Cayley Logo

https://github.com/google/cayley

From the Github:

Cayley is an open-source graph inspired by the graph database behind Freebase and Google’s Knowledge Graph.

Also to get the project owners disclaimer out of the way:

Not a Google project, but created and maintained by a Googler, with permission from and assignment to Google, under the Apache License, version 2.0.

As a personal disclaimer, I’m not a trained mathematician and my interest comes from a love of exploring data. Feel free to correct me if something should be better said.

I’ve seen Neo4j.. I know GraphDB’s

Many people exploring graph databases start with Neo4j and conceptually it’s similar but in usage terms there is a bit of a gap.

Neo4j has the Cyper query language which I find very expressive but also more like SQL in how it works. Cayley uses a Gremlin inspired query language wrapped in JavaScript. The more you use it the more it feels like writing code based interactions with chained method calls. The docs for this interface take some rereading and it was only through some experimentation that I started to see how it all worked. They can be accessed via the Github docs folder. I worked my way through the test cases for some further ideas.

Another major difference is that in Neo4j it’s a bit of a gentler transition from relational databases.  With Neo4j you can group properties on nodes and edges so that as you pull back nodes it feels a little more like hitting a row in a table. Cayley, however, is a triple / quad store based system so everything is treated as a node or vertex. You store only single pieces of related data (only strings in fact) and a collection of properties that would traditionally make up a row or object is built through relationships. This feels extreme at first as to get one row like object you need multiple traversals but over time for me it changed how I looked at data.

unnamed0_-_yEd

As an example (ignoring the major power of graph databases for starters) we might have the question “What is user 123’s height”. In Neo4j we can find a person with id 123, pulling back a node with that person’s name and height. We can then extract the height value. In Cayley you could find the persons id node and then move via the height relationship to the value 184. So in the first case we are plucking property data from a returned node. In the second we collect the information we want to return. This is more a conceptual difference than a pro or a con but it becomes a very clear difference when you start to import data via quad files.

What is an  n-quad?

As mentioned Cayley works on quads / triples which are a simple line of content describing a start, relationship and finish. This can be imagined as two nodes joined by an edge line. What those nodes and relationships are can be many things. Some people have schemas or conventions for how things are named. Some people are using URLs to link web based data. There is a standard that can be read about at www.w3.org:

http://www.w3.org/TR/n-quads/

A simple example might be from the above:

"/user/123" "named" "john" .
"/user/124" "named" "kelly" .
"/user/124" "follows" "/user/123" .

When is a database many databases?

One of the tricky parts of a graph database is how to store things. Many of the graph dbs out there don’t actually store the data but rather sit on an existing database infrastructure and work with information in memory. Cayley is no different as you can layer it upon a few different database types – LevelDB, Bolt, MongoDB and an in memory version.

An interesting part of this is the vague promise of scaling. Most graph databases start off the conversation with node traversal, performance, syntax but they almost all end in scaling. I think Cayley is now entering this territory. As it moves from a proof of concept to something that gets used more heavily, it’s acquiring backends that can scale and the concept of layering more than one Cayley instance in front of that storage layer.

One think to keep in mind is performance is a combination of how the information stored and accessed so put a fast graph db in front of a slow database and you’ll average out a little in speed. For my testing I used a built in leveldb store as it is built in and easy to get started with.

Show me the graph!

One of the first issues I had with Cayley was not 100% knowing how to get graph to page. Neo4j spin up was a little clearer and error handling is quite visual. Cayley you have to get syntax and capitalisation just right for things to play nicely.

Lets assume you have the following graph:

graphy

Node A is connected out to B,C and D. This can be described in a n-quads file as:

"a" "follows" "b" .
"a" "follows" "c" .
"a" "follows" "d" .

If we bring up the web view using a file with that content we can query:

g.V('a').As('source').Out('follows').As('target').All()

Running it as a query should give you some json:

{
  "result": [
    {
      "id": "b",
      "source": "a",
      "target": "b"
    },
    {
      "id": "c",
      "source": "a",
      "target": "c"
    },
    {
      "id": "d",
      "source": "a",
      "target": "d"
    }
  ]
}

Swap to the graph view, run it again and you should see a graph. Not all that pretty but it’s a start.

Cayley

So what’s happening here? Starting at ‘A’ and calling it “source” we traverse joins named “follows” that go out from A and take note of the end node calling it “target”. Be aware that the source / target is case sensitive and if you get it wrong you won’t see anything. When I say “calling” what I mean is that as the nodes are being traversed it will “emit” the value found with the name provided as the key. This is building up the JSON objects with each traversal as a new object in the returned list.

Doing more

So now we have the basics and that’s as far as a lot of the examples go. Taking things a little further.

I recently read an article 56 Experts reveal 3 beloved front-end development tools and in doing so I came across entry after entry of tools and experts. My first reflex was where are the intersections and which are the outliers.  So I decided to use this as a datasource. I pulled each entry into a spread sheet and then ran a little script over it to produce the quads file with:

"<person>" "website" "<url>" .
"<person>" "uses" "<tool name>" .

and for each first mention of a tool:

"<tool>" "website" "<url>" .

The results was a 272 line quads file with people, the software they used and the urls for the software.

From there I started Cayley with the usual command:

cayley http --dbpath=userreviews.nq

So what next? We can find a product and see who is using it:

g.Emit(g.V('sublime text').In('uses').ToArray())

Which results in:

{
 "result": [
  [
   "stevan Živadinovic",
   "bradley neuberg",
   "sindre sorus",
   "matthew lein",
   "jeff geerling",
   "nathan smith",
   "adham dannaway",
   "cody lindley",
   "josh emerson",
   "remy sharp",
   "daniel howells",
   "wes bos",
   "christian heilmann",
   "rey bango",
   "joe casabona",
   "jenna gengler",
   "ryan olson",
   "rachel nabors",
   "rembrand le compte"
  ]
 ]
}

Note I used the specific emit of the array values to avoid a lengthy hash output.

Sure that’s interesting but how about we build a recommendation engine?

Say you are a user that is a fan of SASS and Sublime Text. What are some other tools experts using these tools like?

// paths that lead to users of the tools
var a = g.V('sass').In('uses')
var b = g.V('sublime text').In('uses')

// Who uses both tools
var c = a.Intersect(b).ToArray()

// What tools are used by all of those people
var software = g.V.apply(this, c).Out('uses').ToArray()

// Convert an array to a hash with counts
var results = {}
_.each(software, function(s){
  if(results[s]==null){ results[s]=0; }
  results[s] +=1;
})

// Remove search terms
delete results['sass']
delete results['sublime text']

// Emit results
g.Emit({tools: results, users: c})

Here we are:

  1. finding the people that use sass and sublime text
  2. finding all the tools they use
  3. counting the number of times a tool appears
  4. removing our search tools
  5. emitting the results as the response

This gives us:

{
 "result": [
  {
   "tools": {
    "angularjs": 1,
    "chrome dev tools": 5,
    "jekyll": 1,
    "jquery": 1
   },
   "users": [
    "bradley neuberg",
    "nathan smith",
    "adham dannaway",
    "wes bos",
    "joe casabona",
    "jenna gengler",
    "ryan olson",
    "rachel nabors"
   ]
  }
 ]
}

Note how Cayley is pretty happy for us to move in and out of JavaScript and that underscore.js is available by default. Handy. Also I returned a custom result construction with both the results hash and the users it was derived from.

So this isn’t necessarily the most efficient way of doing things but it’s pretty easy to follow.

I think for many, the fact that Cayley uses a JavaScript based environment will make it quite accessible compared to the other platforms. I hope to keep exploring Cayley in future articles.

Search Posts

Featured Posts

Categories

Archives

View more archives