Blog Archives

Blockchain Analytics with Cayley DB

Posted in Code, Data Analytics, Inside TFG

Bitcoin (and consequently, the Blockchain) have been making waves in the media over the past few years.

In this blog post I will be covering the process of building relationships between blocks, transactions and addresses using Google’s Cayley DB. With this information I may be able to pinpoint important transfers of value and also build an ownership database on top of it to track high value individuals.

I’m going to call this project Bayley, an amalgamation of “Bitcoin” and “Cayley”. I never was that creative.

The Blockchain

So what are the advantages of the Blockchain? What’s so special about it? For starters, for the first time in history we have a database that:

  • Can’t easily be rewritten
  • Eliminates the need for trust
  • Resists censorship
  • Is widely distributed.

A lot of people like to call it a distributed ledger. This is true if you use Bitcoin as a currency. The Blockchain however, has the capability for much more. As a new and possibly disruptive technology I figured it would be a good idea to learn more about it. In the process we also might glean enough of its processes for building unique services on top of the Blockchain.

The Database

I originally tried to work with this project using MongoDB. I ended up shelving the idea as MongoDB is not suitable for this task. The schema is consistent across blocks and I need to be able to easily find relationships between datapoints.

I had a look at LevelGraph and Neo4j but in the end decided to go with Cayley. Cayley has been explored previously by The Frontier Group, and is a very new technology and I wanted to learn how to use it.

Setup Considerations

The first step will be to synchronise a copy of the blockchain locally for your use. I used the testnet instead of mainnet for testing purposes.

Originally I used BTCD as I wanted a server-side, headless daemon. Bitcoin Core can do this, but not in OSX. I constantly ran into bugs and inconsistencies such as:

  • RPC setup using a different variable names making existing libraries that hook into Bitcoin Core useless
  • JSON batch requests not supported

In the end I just opted to run an instance (with GUI and all) of Bitcoin Core on my machine. Get it here!

Before starting to synchronise the Blockchain it might be useful to note that transaction data is not saved in the local blockchain to conserve disk space. Transaction indexing can be turned on with the commandline switch -txindex or adding the line txindex=1 to your bitcoin.conf.

RPC needs to be enabled. Using RPC calls to the Bitcoin daemon will allow you to pull out the block data.

Spinup instructions

Overview of Process

From a high level, the process will look like this:

  • Get block hashes from height
    • Get blocks from block hashes
  • Send an HTTP POST request to Cayley DB of the above data

This does not take into account transaction data either. That will be a topic for a future blog post. So lets get started!

Setting up Bitcoin Core

The Bitcoin Core standard conf file has a lot of stuff in there, but in general you’ll need to make sure the following lines are as follows:

txindex=1
testnet=1
server=1
rpcuser=bitcoinrpc
rpcpassword=BHaVEDoMkVr1xKudcLpVbGi2ctNJsseYrsuDufZxwEXb
rpcport=8332

The rpcpassword is autogenerated by Bitcoin Core. You can use an environment variable if you’re concerned about security and such. Since this project is just for testing purposes and the password is randomised, I’m not too bothered that its sitting there in plaintext.

Block Extraction

We’ll be using Peter Todd’s python-bitcoinlib library. The pprint library is also used printing to console for quick and dirty debugging purposes. Install these using Pycharm, then add to the top of your bayley.py file:

import bitcoin
import bitcoin.rpc
from pprint import pprint as pp

The next step will be to write some simple code to extract some blocks.

def main():
    # This will create a batch of commands that requests the first 100 blocks of the blockchain
    commands = [ {"method": "getblockhash", "params": [height]} for height in range(0, 100 ]
    # Connect to the RPC server, send the commands and assign to the results variable
    conn = bitcoin.rpc.RawProxy()
    results = conn._batch(commands)</p>
    # Extract the hashes out of the result
    block_hashes = [res['result'] for res in results]</p>
    # Prepare to extract specific block data
    blocks = []
    for hash in block_hashes:
        blocks.append(conn.getblock(hash))</p>
    # Call the function to make the triples to prepare for importing to CayleyDB
    block_triples = make_triples_for_block(blocks)

Block Structure

Here is an example of a single block’s data:

{'bits': '1d00ffff',
 'chainwork': '0000000000000000000000000000000000000000000000041720ccb238ec2d24',
 'confirmations': 1,
 'difficulty': Decimal('1.00000000'),
 'hash': '0000000084ee00066214772c973896dcb65946d390f64e5d14a1d38dfa2e4d90',
 'height': 445610,
 'merkleroot': 'eaf042fa845ea92aba661632bc6b8e78e8e64c2917a92f1a7da0800ed793b819',
 'nonce': 1413010373,
 'previousblockhash': '0000000087a272f48c3785de422e232c0771e2120c8fdd741a19ea98d122132b',
 'size': 315,
 'time': 1432705094,
 'tx': ['eaf042fa845ea92aba661632bc6b8e78e8e64c2917a92f1a7da0800ed793b819'],
 'version': 3}

With this in mind we can begin working on pulling the data from the blockchain and parsing the specific blocks.

Making Triples

Cayley uses the subject, predicate, object system, known as a triplestore. We need to parse the block data from the previous section into this triplestore format.

One of the limitations of the triplestore is that you can not add much metadata to each node. Array indexing and similar are a problem in this regard. In this case we will use the blockhash as the subject for all block data, the key value for all predicates, and the block data (excluding the block hash) as the object variable.

Lets create a function that does this:
At the top of my bayley.py file I will create a global variable which specifies which key value pairs for which I want to create a triplestore.

DESIRED_BLOCK_KEYS = ("height", "nextblockhash", "previousblockhash", "size", "time", "difficulty")

Next I wish to declare the function:

def make_triples_for_block(blocks):
    triples = []

We will next need to iterate through the blocks and their respective keys to start pulling the relevant data. The first thing to do is to ignore the blockhash key:

def make_triples_for_block(blocks):
    triples = []
    for block in blocks:
        for key in block:
            # Ignore self reference
            if (key == "hash"):
                continue

The transactions value is an array so its best to iterate through these separately.

def make_triples_for_block(blocks):
    triples = []
    for block in blocks:
        for key in block:
            # Ignore self reference
            if (key == "hash"):
                continue
            # Iterate through transactions
            if (key == "tx"):
                for t in block[key]:
                    triples.append({
                        "subject": block['hash'],
                        "predicate": key,
                        "object": t
                    })

And finally we can now append our block data to the triples array we declared in the beginning. Note how I casted the values to strings, this was to prevent an issue later on when you want to import into CayleyDB. Cayley is happiest when you give her JSON files that are all strings.

def make_triples_for_block(blocks):
    triples = []
    for block in blocks:
        for key in block:
            # Ignore self reference
            if (key == "hash"):
                continue
            # Iterate through transactions
            if (key == "tx"):
                for t in block[key]:
                    triples.append({
                        "subject": block['hash'],
                        "predicate": key,
                        "object": t
                    })
            # Iterate through first level block data
            if (key in DESIRED_BLOCK_KEYS):
                triples.append({
                    "subject": str(block['hash']),
                    "predicate": key,
                    "object": str(block[key])
                })
    return triples

So now we have a triples variable returned which contains all of our triples ready for importing!

Here is an example of the triples for a single block for your reference:

[{'object': '1',
  'predicate': 'height',
  'subject': '00000000b873e79784647a6c82962c70d228557d24a747ea4d1b8bbe878e1206'},
 {'object': '190',
  'predicate': 'size',
  'subject': '00000000b873e79784647a6c82962c70d228557d24a747ea4d1b8bbe878e1206'},
 {'object': 'f0315ffc38709d70ad5647e22048358dd3745f3ce3874223c80a7c92fab0c8ba',
  'predicate': 'tx',
  'subject': '00000000b873e79784647a6c82962c70d228557d24a747ea4d1b8bbe878e1206'},
 {'object': '000000006c02c8ea6e4ff69651f7fcde348fb9d557a06e6957b65552002a7820',
  'predicate': 'nextblockhash',
  'subject': '00000000b873e79784647a6c82962c70d228557d24a747ea4d1b8bbe878e1206'},
 {'object': '1.00000000',
  'predicate': 'difficulty',
  'subject': '00000000b873e79784647a6c82962c70d228557d24a747ea4d1b8bbe878e1206'},
 {'object': '000000000933ea01ad0ee984209779baaec3ced90fa3f408719526f8d77f4943',
  'predicate': 'previousblockhash',
  'subject': '00000000b873e79784647a6c82962c70d228557d24a747ea4d1b8bbe878e1206'},
 {'object': '1296688928',
  'predicate': 'time',
  'subject': '00000000b873e79784647a6c82962c70d228557d24a747ea4d1b8bbe878e1206'}]

Setting up Cayley

Cayley uses golang. A packaged binary is available (so you shouldn’t need to setup golang separately) from here.

This is my Cayley config file:

{
"database": "bolt",
"db_path": "./blockchain",
"read_only": false,
"replication_options": {
  "ignore_duplicate": true,
  "ignore_missing": true
}
}

I’m using bolt db over leveldb because bolt is slightly better for high reads. You can read more here.

After making the cayley.cfg file, initialise the database by running the init command like so (from the Cayley folder):

./cayley init -config cayley.cfg

This will create a blockchain file and prep the backend database for Cayley goodness. The next step will be to run the HTTP server:

./cayley http -config cayley.cfg

Now we’re ready to send all the data in!

Sending to Cayley

Cayley’s HTTP documentation will help with this section. It receives JSON triples in the form of the following:

[{
    "subject": "Subject Node",
    "predicate": "Predicate Node",
    "object": "Object node",
    "label": "Label node"  // Optional
}]   // More than one quad allowed.

We’ll need to POST this data to our Cayley server’s write API via http://localhost:64210/api/v1/write.

Now we need to make use of the excellent requests python library. Install it in Pycharm then add the following to the top of the bayley.py file. Cayley is expecting a json file so we’ll also need to install and import that.

You’ll also want to put in a global variable there for Cayley’s URL and also tell Cayley that we’re sending a JSON file.

import requests
import json
DB_WRITE_URL = "http://127.0.0.1:64210/api/v1/write"
DB_WRITE_HEADERS = {'Content-type': 'application/json'}

We’re going to create a function to send the data over to Cayley. Note how the data is converted to json in the data= argument.

def send_data(data):
    r = requests.post(DB_WRITE_URL, data=json.dumps(data), headers=DB_WRITE_HEADERS)
    pp(r)
    pp(r.text)

If the pp(r) prints out a response of 200 then we’re good! If not then we’ll need to look at what went wrong which is usually explained well in the r.text variable. This is the result I got:

<Response [200]>
'{"result": "Successfully wrote 693 quads."}'

Go back to your main function and call the send_data function:

def main():
    ...
    send_data(block_triples)
    ...

And that should do it.

Graphing the result

By now we should have 100 blocks in Cayley! Head over to http://localhost:64210 and lets start graphing!

In the query page we can test out our queries. I wrote a simple one that loops through the first 5 blocks, gets all objects that are one edge away (Out()) and gets the result:

for(var i=0; i<5; i++){
    g.V().Has("height", String(i)).Tag("source").Out().Tag("target").All();
}

Here is the result of the first block:

{
 "result": [
  {
   "id": "4a5e1e4baab89f3a32518a88c31bc87f618f76673e2cc77ab2127b7afdeda33b",
   "source": "000000000933ea01ad0ee984209779baaec3ced90fa3f408719526f8d77f4943",
   "target": "4a5e1e4baab89f3a32518a88c31bc87f618f76673e2cc77ab2127b7afdeda33b"
  },
  {
   "id": "1296688602",
   "source": "000000000933ea01ad0ee984209779baaec3ced90fa3f408719526f8d77f4943",
   "target": "1296688602"
  },
  {
   "id": "0",
   "source": "000000000933ea01ad0ee984209779baaec3ced90fa3f408719526f8d77f4943",
   "target": "0"
  },
  {
   "id": "285",
   "source": "000000000933ea01ad0ee984209779baaec3ced90fa3f408719526f8d77f4943",
   "target": "285"
  },
  {
   "id": "1.00000000",
   "source": "000000000933ea01ad0ee984209779baaec3ced90fa3f408719526f8d77f4943",
   "target": "1.00000000"
  },
  {
   "id": "00000000b873e79784647a6c82962c70d228557d24a747ea4d1b8bbe878e1206",
   "source": "000000000933ea01ad0ee984209779baaec3ced90fa3f408719526f8d77f4943",
   "target": "00000000b873e79784647a6c82962c70d228557d24a747ea4d1b8bbe878e1206"
  }
 ]
}

The query shape looks like this:

queryshape

The visualisation itself looks like these following images.

Single block:
oneblock

Five blocks:
fiveblocks

As you can see there are shared nodes, this is because the nodes have the same predicate and objects, but different subject (blockhash). This is a good example of how cayley helps in visualising relationships.

One hundred blocks:

hundredblocks

The shared nodes here are due to the common block size and block difficulty (the latter changes every 2 weeks). You can see a close up below:

zoom

Conclusion

This is just the early stage. The next step will be to parse the transactions for Bitcoin addresses and start drawing all the relationships between them. Once a strong system is in place for parsing the block chain, you might want to begin parsing the other 400,000 blocks or so, and also switch to the mainnet. Web scraping usernames for addresses and also estimating relationships based on round number transferring of value also is in the pipeline.

Using CanCan to implement an “Agree to Terms” workflow

Posted in Code, Inside TFG, Ruby on Rails

Terms & Conditions
Recently in a Rails application I was tasked with adding in a basic “terms and conditions” page.

There was nothing special about the feature, but I was really happy with my solution so I decided to write about it briefly.

Thinking about a solution

So, my initial plan was:

  1. Add a agreed_to_term_and_conditions_at datetime field to the User model. Use a datetime here so that if we change the conditions later we can check against the time.
  2. Perform checks in the app to prevent users that haven’t agreed to the terms and conditions will be redirected to the T&C workflow

Step 1 was straight forward, but step 2 got me thinking.

Initially I had considered implementing a before_filter in ApplicationController that would check whether the User had agreed to the T&Cs and redirect them to the T&Cs page if they hadn’t.

After thinking for a moment I decided that it was really a question of authorization, and as a result should be managed by an Ability file.

The reasoning I used is that I would say a user should not be able to access the site until they had agreed to the terms. That sounds suspiciously like a cannot statement in CanCanCan.

Implementing a solution with CanCanCan

Once I had decided to use CanCanCan to implement the solution, it was just a matter of getting all the parts together.

Firstly, I had my abilities split into separate files in the way I have suggested in this post. I had an Ability::Factory that would take a User (or nil) and return the appropriate ability file. It looks something like:

class Ability::Factory

  def self.build_ability_for(user)
    return Ability::Anonymous.new if user.nil?

    case user.role
    when :admin
      Ability::Admin.new(user)
    when :supervisor
      Ability::Supervisor.new(user)
    when :doctor
      Ability::Doctor.new(user)
    when :patient
      Ability::Patient.new(user)
    else
      raise(Ability::UnknownRoleError, "Unknown role passed through: #{user.role}")
    end
  end

end

My initial idea was to do some checks for each role and basically say something like:

if user.has_agreed_to_terms_and_conditions?
  # Implement abilities as per usual
else
  cannot :manage, :all
end

But that would lead to a lot of duplication in both implementation and tests. Plus, just a lot of code in general, which I despise.

Thinking further, I decided that a User who hadn’t agreed to the terms and conditions had a set of abilities of their own, independently to their role. I created a new Ability for such a condition: Ability::PendingAgreementToTermsAndConditions. The class was implemented like:

class Ability::PendingAgreementToTermsAndConditions < Ability

  def initialize(user)
    cannot :manage, :all
    can :agree_to_terms_and_conditions, User, id: user.id
  end

end

I amended my Ability::Factory so that it would return the pending ability in the right conditions:

class Ability::Factory

  def self.build_ability_for(user)
    return Ability::Anonymous.new if user.nil?

    if user.has_agreed_to_terms_and_conditions?
      ability_class_for(user.role).new(user)
    else
      Ability::PendingAgreementToTermsAndConditions.new(user)
    end
  end

private

  def ability_class_for(role)
    case role
    when :admin
      Ability::Admin
    when :supervisor
      Ability::Supervisor
    when :doctor
      Ability::Doctor
    when :patient
      Ability::Patient
    else
      raise(Ability::UnknownRoleError, "Unknown role passed through: #{user.role}")
    end
  end

end

Great, so now I had all the abilities I needed. It was time to incorporate the logic into my controller so the application would handle users who hadn’t agreed to the T&Cs.

I had to ensure two things:

  1. Users can’t access other pages in the app that aren’t the T&Cs. When they do, they will be redirected to the T&Cs page.
  2. When a user who hasn’t agreed to the T&Cs signs in, they are redirected to the T&Cs page.

Handling access to pages when terms and conditions aren’t agreed to

Regarding the first objective: Anyone who has used CanCan or CanCanCan will know that since the ability file prohibits users from accessing other pages (cannot :manage, :all), a CanCan::AccessDenied exception will be raised if those pages are hit.

That means that I just had to handle that exception, and redirect the user to the T&Cs page. The CanCanCan README explains how to catch this exception in detail, but I’ll post the code I used anyway:

class ApplicationController < ActionController::Base

  rescue_from CanCan::AccessDenied do |exception|
    if current_user.present?
      # You could also do: current_ability.can?(:agree_to_terms_and_conditions, current_user)
      # but I think the following reads better
      if current_user.has_agreed_to_terms_and_conditions?
        # Redirect as usual
      else
        # Redirect to the terms page
      end
    else
      # Do whatever for unauthed users
    end
  end

end

Moving on, let’s ensure the user isn’t sent straight to another redirect when they sign in.

Redirecting users to the terms and conditions page when they sign in

This is a problem for your authentication system. I use devise, so I was able to override the after_sign_in_path_for method in my ApplicationController as outlined in the documentation. The code looks like:

class ApplicationController < ActionController::Base

  # Override: Devise method
  def after_sign_in_path_for(user)
    # You could also do: current_ability.can?(:agree_to_terms_and_conditions, current_user)
    # but I think the following reads better
    if user.agreed_to_terms_and_conditions_at.present?
      # Redirect as usual
    else
      # Redirect to the terms page
    end
  end

end

Now the user will get one redirect, instead of being redirected to a page they can’t access.

Conclusion

So that’s my solution. About 20 extra lines of code (plus tests) and now you’ve got all the logic for implementing a terms and conditions workflow.

I really enjoyed implementing that solution. It was easy to write and has had no maintenance cost.

Separating abilities in CanCan

Posted in Code, Inside TFG, Ruby on Rails

Users and Authorization
Authorization is simple to implement in Rails thanks to gems like CanCan and its chubby cheeked offspring CanCanCan. When getting started with CanCanCan, the documentation suggests that you use the generator to create a single Ability file to store your abilities in. This is a great starting point, but in my experience few projects using CanCanCan ever evolve past the use of a single Ability file – much to their detriment.

In this post I’ll have a look at an example Ability file and enumerate some flaws in such a system. Following that I’ll discuss a way to improve it by breaking the Ability file out into multiple Ability files.

Defining a real-world example

Let’s imagine a somewhat complicated application that has:

4 different roles, that may have the ability to perform up to 50 different actions.

For some context, let’s say the application stores medical imaging scans (ultrasounds and so forth) and has the following roles:

  1. Patients that sign in and view information about their scans, and can grant access to their scans to doctors.
  2. Doctors that sign in and add notes and attach dictations to these scans, and can look over all their patients’ scans.
  3. Supervisors that sign in and manage the doctors, assigning them to hospitals and medical practices. Assigning patients to doctors.
  4. Admins that sign in and manage all the above user accounts, and perform basic CRUD for hospitals and medical practices.

An Ability file for such an application might look like:

class Ability
  include CanCan::Ability

  def initialize(user)
    # Anonymous users don't have access to anything
    return if user.nil?

    case user.role
    when :admin
      can :manage, :all
    when :supervisor
      # Between 1-50 can/cannot statements
    when :doctor
      # Between 1-50 can/cannot statements
    when :patient
      # Between 1-50 can/cannot statements
    else
      raise(Ability::UnknownRoleError, "Unknown role passed through: #{user.role}")
    end
  end

end

 

The problems with a single Ability file

As you can see, if we have many different abilities per user this file will get quite large.

Let’s say that there are very few shared abilities and that for each of the supervisor, doctor, and patient roles we have 50 lines of ability declarations. That equates to roughly 170 lines of code in the file. Historically, I’ve found that spec to implementation ratio is about 2:1, so let’s imagine there’s a 340 line spec file that corresponds to this implementation.

There are many problems with an Ability file of this size.

  1. Larger files are harder to understand, maintain, and debug. There are just too many different concepts for a developer to deal with in one location, many of which will be irrelevant for whatever their task at hand is.
  2. Spec files becomes even harder to maintain for larger implementations. Since spec:code ratio can bloat in excess of 2:1, it will be even harder to maintain the specs.
  3. The Ability file will have far too many responsibilities, violating the Single Responsibility Principle and suffering from the drawbacks of such behaviour. This boils down to a higher maintenance cost and defect rate. If you don’t like OO sophistry, let me put it another way: The file tries to do too much. The class answers the question “what can every user in the system conceivably do?”. Whereas, we are far more likely to be interested in what just one of those users can do at any given time.
  4. In a similar vein, the Ability class becomes a god class for authorization. I feel like CanCan and CanCanCan only encourage this behaviour by having an opaque default for determining which Ability to use. By default, CanCanCan will assume that you have an Ability class in its current_ability function. There is a section in the GitHub wiki on changing this, though.
  5. Large classes and files suffer from the broken windows theory – that is: since the class is already so massive and bloated, new developers on the project will just pile more functionality on top of the existing mess – despite the cost in readability and maintainability. Further, the scope of the class starts to spread as more and more code is tacked on. You might hear such excuses from developers as “I’m just following the convention of the existing app” or “look, it’s already screwed so it’s not like I can make it any worse”. You may also hear my favourite “we don’t have budget to refactor”. Yeah, if you don’t have budget to refactor imagine how much budget you DON’T have for fixing defects all the time because you keep piling shit on more shit.

Deconstructing the monolithic Ability file

In order to resolve the issues above, we must break the Ability file apart. Usually, my first tactic is to identify the responsibilities of the class and break the code out into classes that represent each of these responsibilities.

Let’s review the responsibilities of the Ability class above:

  1. It defines what an anonymous user can do (EG: handling user.nil?)
  2. It defines what an admin can do
  3. It defines what a supervisor can do
  4. It defines what a doctor can do
  5. It defines what a patient can do
  6. It handles unknown roles

Now we create a class for each of those responsibilities. Which would leave us with classes like:

  1. AnonymousUserAbility – basically a null object for abilities.
  2. AdminAbility
  3. SupervisorAbility
  4. DoctorAbility
  5. PatientAbility, and
  6. UserAbilityFactory (or Ability::Factory if using namespaces), which takes a User (or nil) and returns the corresponding Ability class above. This class also handles roles without defined abilities by raising an exception.

You may also like to keep an Ability file that includes the CanCan::Ability module and contains some shared functions that will be used in the other ability files.

You should store these files in app/abilities. They are not models as defined by MVC, so they don’t belong in app/models, which is where CanCan and CanCanCan stash the Ability file by default.

You may also like to namespace these classes (EG: Ability::AnonymousUser), since namespaces can also improve the organisation of an application.

An example of one of the Ability files is:

class Ability::Patient < Ability

  def initialize(user)
    [:show, :invite_doctor].each do |ability|
      can ability, Result, patient_id: user.id
    end
    # etc
  end

end

Now I can have private methods that are specific to just the abilities of the Patient, rather than private methods being for all the different roles. I have a single function that can tell me the sum total of a Patient’s abilities. I can include some additional documentation in the class explaining the role of the Patient in our application for future developers.

Let’s have a look at the Ability::Factory now. I named this class after the factory pattern since its job is to take a User (or nil) and build us a corresponding Ability file. If you wanted, you could just put the function in Ability. I prefer the new class implementation, which would look like:

class Ability::Factory

  def self.build_ability_for(user)
    return Ability::Anonymous.new if user.nil?

    case user.role
    when :admin
      Ability::Admin.new(user)
    when :supervisor
      Ability::Supervisor.new(user)
    when :doctor
      Ability::Doctor.new(user)
    when :patient
      Ability::Patient.new(user)
    else
      raise(Ability::UnknownRoleError, "Unknown role passed through: #{user.role}")
    end
  end

end

The corresponding controller change to get CanCan or CanCanCan to play nice with your new Abilities would be:

class ApplicationController

  # Override CanCan method to provide custom Ability files
  def current_ability
    @current_ability ||= Ability::Factory.build_ability_for(user)
  end

end

Please note: If you are using an additional engine like RailsAdmin or ActiveAdmin, some more work might need to be done in order to get the engine to play nice. You will have to do some spelunking the engine’s codebase to determine how CanCan or CanCanCan is integrated.

Conclusion

Now our large Ability file is broken into smaller, more manageable files. Each file now has a single responsibility and is easier to test. If we need to add a new role it won’t be a nightmare to patch the Ability file. We just build a new file and ensure it is in the Ability::Factory. Luckily, since our factory handles unknown roles by raising an exception, we’ll find out pretty quickly if there’s no corresponding Ability file.

Having a single file per role increases the ease with which we can verify the responsibilities of that role. We can read a single file and determine exactly what the Patient does, for example. Before, it was hidden in the guts of the Ability file.

When it comes to authorization, you want as high a level of visibility as you can on roles, so you don’t have anyone doing things they shouldn’t be able to.

Using FactoryGirl to easily create complex data sets in Rails

Posted in Code, Inside TFG, Ruby on Rails

I use FactoryGirl for setting up data in my application. FactoryGirl gives you all the tools you need to quickly and easily create data for models in your application. Leveraging ffaker you can make realistic looking, randomized data.

Often, you will have complex associations between objects in your system that can be a pain to factory up. I’ve frequently seen people use individual factories to build up these relationships. The amount of work required to set these associations up quickly gets tedious and turns your code into an unreadable mess.

In this article, I will run through some features of FactoryGirl that you can leverage to easily create complex associations.

Transient Attributes

One of the features I use most in FactoryGirl is the transient attributes. Transient attributes allow you to pass in data that isn’t an attribute on the model. I frequently use transient attributes to allow me to use a single FactoryGirl call to create multiple objects.

For example, say you have two models, User and Role. A User has one Role. You might do something like:

role = FactoryGirl.create(:role, name: “Head Buster”)
user = FactoryGirl.create(:user, role: role)

Using transient attributes you could define the following factory:

factory :user do
  transient do
    role_name “admin”
  end

  role do
    Role.find_by(name: role_name) || FactoryGirl.create(:role, name: role_name)
  end
end

which would then allow you to do:

user = FactoryGirl.create(:user, role_name: “Head Buster”)

Traits

Another of my favourite features is traits. You could solve the scenario above using traits by doing something like:

factory :user do
  trait :head_buster do
    role do
      Role.find_by(name: “Head Buster”) || FactoryGirl.create(:role, name: “Head Buster”)
    end
  end
end

which would then allow you to do:

user = FactoryGirl.create(:user, :head_buster)

I’ve found that the power of traits expands exponentially with the complexity of the model they are trying to map. The more states your model can be in, and the more data it has attached to it, the more you’ll be able to use traits to simplify data creation. Try to abstract any state that an object can be in into a trait to simplify usage.

Callbacks

Callbacks in FactoryGirl are also extremely useful. They work hand in hand with transient attributes and traits to allow you perform any non-obvious setup in your factories.

Let’s imagine an app which has the following models:

  • User
  • Book
  • UserReadBook (Join between User and Book, indicating the user has read this book)
  • WishlistBook (Join between User and Book, indicating the user added this book to their Wishlist)

Out of the box, if you wanted to create one of each type of object, you might have some FactoryGirl calls like:

user = FactoryGirl.create(:user)
book = FactoryGirl.create(:book)
FactoryGirl.create(:user_read_book, user: user, book: book)
FactoryGirl.create(:wishlist_book, user: user, book: book)

Let’s say we have a function on User: #related_books, which returns all Books that the User has read or added to their wishlist. Our RSpec tests for such a function might look like:

describe '#related_books' do
  subject(:related_books) { user.related_books }
  let(:user) { FactoryGirl.create(:user) }

  it "includes books this user has read" do
    expect(related_books).to include(FactoryGirl.create(:user_read_book, user: user).book)
  end
  it "includes books this user has added to their wishlist" do
    expect(related_books).to include(FactoryGirl.create(:wishlist_book, user: user).book)
  end
  it "doesn't include books read by other users" do
    expect(related_books).not_to include(FactoryGirl.create(:user_read_book).book)
  end
  it "doesn't include books other users have added to their wishlist" do
    expect(related_books).not_to include(FactoryGirl.create(:wishlist_book).book)
  end
end

Doesn’t look TOO bad. I REALLY don’t like having to tack the .book on the end there. I also don’t like that I’m not directly creating the type of object I want returned in my test. Personally, I think it makes the tests harder to understand. The bigger problem is when we need to refactor.

What happens when requirements change and we have to add in a VideoGame model? Now we change the UserReadBook and WishlistBook models to be polymorphic so they can also hold VideoGames. As a result, we rename the models to UserCompletedItem and WishlistItem.

It’s extremely likely we’ll used the original join table factories in multiple places to test other scopes, searching functions, and more. As a consequence, we have to update all our specs to use the updated join table name. Doesn’t this last step seem like an unnecessary pain in the ass?

What we should have done is used our factories to abstract the concept of wishlisting or reading a Book. Our tests generally want to ensure that there is a specific type of relationship between a Book and a User, but they shouldn’t really need to care about the specifics of it. Let’s look at how factories can help us.

The first thing I do when trying to abstract these concepts is work out the interface I want in my factories. In the case above, I’d like to be able to write:

FactoryGirl.create(:book, read_by: user) # and
FactoryGirl.create(:book, wishlisted_by: user)

I can support this interface using transient attributes and factory callbacks. I can update to my Book factory to look like:

FactoryGirl.define do
  factory :book do
    transient do
      read_by nil
      wishlisted_by nil
      # nil is a sensible default, we don't want our factories creating
      # extra data unnecessarily. It slows your test suite down
    end

    after(:create) do |book, factory|
      if factory.read_by
        FactoryGirl.create(:user_read_book, book: book, user: factory.read_by)
      end
      if factory.wishlisted_by
        FactoryGirl.create(:wishlist_book, book: book, user: factory.wishlisted_by)
      end
    end
  end
end

Here’s what I like about abstracting the concept of reading or wishlisting a Book using factories:

Simpler Tests

Our tests are no longer loaded with implementation details of joining the Book and User. This is especially useful in even more complex relationships. Basically, if my test is checking that a book is returned, I only ever want to create a Book. I don’t want to have to creating multiple other models.

Reduced cost of refactoring

When we have to update the join between Book and User, we only need to update one factory instead of every test that had instantiated one of the renamed join tables.

More concise tests

Although in my example I used a one liner for getting a read or wishlisted Book, in reality the syntax you’d probably see is:

user = FactoryGirl.create(:user)
book = FactoryGirl.create(:book)
FactoryGirl.create(:user_read_book, user: user, book: book)
FactoryGirl.create(:wishlist_book, user: user, book: book)

Which with the factory above could be reduced to:

user = FactoryGirl.create(:user)
book = FactoryGirl.create(:book, read_by: user, wishlisted_by: user)

This may not seem like much, but it can build up. Recently I made a similar refactoring in a spec file that contained 10 such blocks. That amounted to 20 fewer LoC or around 5% fewer LoC in the spec file. Also, had I written my factory that way originally I would had to type a hell of a lot less, too.

Here’s what my specs would look like with my updated factories:

describe '#related_books' do
  subject(:related_books) { user.related_books }

  let(:user)       { FactoryGirl.create(:user) }
  let(:other_user) { FactoryGirl.create(:user) }
  it "includes books this user has read" do
    expect(related_books).to include(FactoryGirl.create(:book, read_by: user))
  end
  it "includes books this user has added to their wishlist" do
    expect(related_books).to include(FactoryGirl.create(:book, wishlisted_by: user))
  end
  it "doesn't include books read by other users" do
    expect(related_books).not_to include(FactoryGirl.create(:book, read_by: other_user))
  end
  it "doesn't include books other users have added to their wishlist" do
    expect(related_books).not_to include(FactoryGirl.create(:book, wishlisted_by: other_user)
  end
end

Wrapping up

So using traits, transient attributes, and callbacks we can make our FactoryGirl factories do a lot more of the heavy lifting for us.

We can also abstract complex associations to reduce the cost of refactoring and increase the readability of our tests.

Although those are my favourite feature, they don’t cover everything FactoryGirl offers. I’d recommend going through the FactoryGirl documentation and thinking about what you can do to get more out of factories in your code.

 

A quick tip with .Net generics

Posted in Code, Inside TFG

Generics constraints in .Net can be recursive. This means that you can use a type in its own generic constraints. Let’s look at an example of where this can be useful.

Let’s say you have some kind of persistent object IEntity. To avoid primitive obsession we are going to create a type safe Reference<T> object to act as a pointer to our entities, rather than just having a property of int called Id.

IReference<TEntity>
where TEntity : IEntity
{
// Actual interface doesn't matter
}

We want a base entity to inherit from, which among other things exposes an IReference<T> to itself.  We can’t be much more specific than returning an IReference<EntityBase>, since we can’t know the subclass type at compile time. Unless we hail to the generic recursion gods.

EntityBase<TSelf> : IEntity
where TSelf : EntityBase<TSelf>
{
IReference<TSelf> Reference { get { ... } };
}

Now we just supply the type when we declare our subclass:

MyEntity : EntityBase<MyEntity>
{
}

You can do much the same thing in Java, but it’s not quite as safe since MyEntity extends EntityBase<OtherEntity> will compile just fine.

As an exercise to the reader; consider the visitor pattern, where we implement a virtual Accept method in order to have compile time type knowledge of this. Can you now write a non virtual Accept method?

Search Posts

Featured Posts

Categories

Archives

View more archives