Clean up your rails_helper file

Posted in Code, Inside TFG, Ruby on Rails

The code that supports tests is often neglected when it is time to refactor rails code. I suspect this is because developers have limited time for refactoring and they feel it’s better spent on implementation code.

Although those priorities aren’t wrong, I think it’s valuable at least once in the lifespan of a project to invest some time in cleaning up your test environment. That way you can accumulate optimizations and good practices for each new application you work with.

In rails applications that use RSpec the rails_helper (née: spec_helper) file provides configuration for the application’s specs.

In a moderately complex application, this might include configuration for Sidekiq, DatabaseCleaner, the javascript driver (EG: Poltergeist), and many other utilities.

Usually, developers will just dump the configuration for these utilities in their various before, after, and around hooks that litter the rails_helper file.

In this post I’ll outline a simple way that you can organise your rails_helper file so that it is easier to read and maintain.

Organising rails_helper

Let’s have a look at what a rails_helper might look like in a Rails codebase:

require 'spec_helper'

# Some crufty requires omitted here

RSpec.configure do |config|

  # If you're not using ActiveRecord, or you'd prefer not to run each of your
  # examples within a transaction, remove the following line or assign false
  # instead of true.
  config.use_transactional_fixtures = false

  # If true, the base class of anonymous controllers will be inferred
  # automatically. This will be the default behavior in future versions of
  # rspec-rails.
  config.infer_base_class_for_anonymous_controllers = false

  # rspec-rails 3 will no longer automatically infer an example group's spec type
  # from the file location. You can explicitly opt-in to the feature using this
  # config option.
  config.infer_spec_type_from_file_location!

  config.include ActiveSupport::Testing::TimeHelpers
  config.include(EmailSpec::Helpers)
  config.include(EmailSpec::Matchers)

  config.include ArchivableSpecHelper, type: :model
  config.include IssueSharedGroups,    type: :model

  config.include Devise::TestHelpers,    type: :controller
  config.include ControllerSharedGroups, type: :controller
  config.extend AuthenticationHelper,    type: :controller

  config.include Helpers::CurrentUserSupport, type: :view
  config.include ViewSharedGroups,            type: :view
  config.extend ViewMacros,                   type: :view

  config.include FeatureHelper,                        type: :feature
  config.extend Features::AuthenticationClassSupport,  type: :feature
  config.include Features::AuthenticationSupport,      type: :feature
  config.include Features::MembershipSelectionSupport, type: :feature
  config.include Features::RailsAdminArchiveSupport,   type: :feature
  config.include Features::RailsAdminDeleteSupport,    type: :feature
  config.include Features::UserSupport,                type: :feature

  config.before(:suite) do
    DatabaseCleaner.strategy = :transaction
    DatabaseCleaner.clean_with(:truncation)
  end

  config.after(:each) do |example|
    if example.exception && example.metadata[:js]
      meta = example.metadata
      filename = File.basename(meta[:file_path])
      line_number = meta[:line_number]
      screenshot_name = "screenshot-#{filename}-#{line_number}.png"
      screenshot_path = "#{Rails.root.join("tmp")}/#{screenshot_name}"
      html_name = "page-#{filename}-#{line_number}.html"
      html_path = "#{Rails.root.join("tmp")}/#{html_name}"
      page.save_page(html_path)
      page.save_screenshot(screenshot_path)
      puts meta[:full_description] + "\n  HTML: #{html_path}"
      puts meta[:full_description] + "\n  Screenshot: #{screenshot_path}"
    end
  end

  config.around(:each, js: true) do |ex|
    DatabaseCleaner.strategy = :truncation
    ex.run
    DatabaseCleaner.strategy = :transaction
  end

  config.around(:each, truncation: true) do |ex|
    DatabaseCleaner.strategy = :truncation
    ex.run
    DatabaseCleaner.strategy = :transaction
  end

  config.before(:each) do
    DatabaseCleaner.start
  end

  config.after(:each) do |example|
    DatabaseCleaner.clean
  end

### START - Sidekiq configuration for tests
  config.before(:each) do
    # Clears out the jobs for tests using the fake testing
    Sidekiq::Worker.clear_all
  end

  config.around(:each) do |example|
    if example.metadata[:sidekiq] == :fake
      Sidekiq::Testing.fake!(&example)
    elsif example.metadata[:sidekiq] == :inline
      Sidekiq::Testing.inline!(&example)
    elsif example.metadata[:type] == :feature
      Sidekiq::Testing.inline!(&example)
    else
      Sidekiq::Testing.fake!(&example)
    end
  end
### END - Sidekiq configuration for tests

  # Carrierwave settings for testing
  config.after(:all) do
    # See initializers/carrierwave.rb
    uploaded_files_path = Rails.root.join("public", "test_assets")
    FileUtils.rm_r(uploaded_files_path) if File.exists?(uploaded_files_path)
  end
end

Oh boy. This configuration file suffers from a poor signal to noise ratio. It’s really hard to tell what is being configured, particularly if I’m just looking for the configuration of just one utility.

In the case of the rails_helper above, the solution is to break each set of configuration items out into its own file. That would give us the following support files:

# spec/capybara_screenshot_support.rb
RSpec.configure do |config|
  config.after(:each) do |example|
    if example.exception && example.metadata[:js]
      meta = example.metadata
      filename = File.basename(meta[:file_path])
      line_number = meta[:line_number]
      screenshot_name = "screenshot-#{filename}-#{line_number}.png"
      screenshot_path = "#{Rails.root.join("tmp")}/#{screenshot_name}"
      html_name = "page-#{filename}-#{line_number}.html"
      html_path = "#{Rails.root.join("tmp")}/#{html_name}"
      page.save_page(html_path)
      page.save_screenshot(screenshot_path)
      puts meta[:full_description] + "\n  HTML: #{html_path}"
      puts meta[:full_description] + "\n  Screenshot: #{screenshot_path}"
    end
  end
end

# spec/carrierwave_support.rb
RSpec.configure do |config|
  config.after(:all) do
    # See initializers/carrierwave.rb
    uploaded_files_path = Rails.root.join("public", "test_assets")
    FileUtils.rm_r(uploaded_files_path) if File.exists?(uploaded_files_path)
  end
end

# spec/database_cleaner_support.rb
RSpec.configure do |config|
  config.before(:suite) do
    DatabaseCleaner.strategy = :transaction
    DatabaseCleaner.clean_with(:truncation)
  end

  config.around(:each, js: true) do |ex|
    DatabaseCleaner.strategy = :truncation
    ex.run
    DatabaseCleaner.strategy = :transaction
  end

  config.around(:each, truncation: true) do |ex|
    DatabaseCleaner.strategy = :truncation
    ex.run
    DatabaseCleaner.strategy = :transaction
  end

  config.before(:each) do
    DatabaseCleaner.start
  end

  config.after(:each) do
    DatabaseCleaner.clean
  end
end

# spec/sidekiq_support.rb
RSpec.configure do |config|
  config.before(:each) do
    # Clears out the jobs for tests using the fake testing
    Sidekiq::Worker.clear_all
  end

  config.around(:each) do |example|
    if example.metadata[:sidekiq] == :fake
      Sidekiq::Testing.fake!(&example)
    elsif example.metadata[:sidekiq] == :inline
      Sidekiq::Testing.inline!(&example)
    elsif example.metadata[:type] == :feature
      Sidekiq::Testing.inline!(&example)
    else
      Sidekiq::Testing.fake!(&example)
    end
  end
end

…and an updated rails_helper file that looks like:

require 'spec_helper'
# Some crufty requires omitted here

# Utility specific configuration
require 'capybara_screenshot_support'
require 'carrierwave_support'
require 'database_cleaner_support'
require 'sidekiq_support'

RSpec.configure do |config|

  # If you're not using ActiveRecord, or you'd prefer not to run each of your
  # examples within a transaction, remove the following line or assign false
  # instead of true.
  config.use_transactional_fixtures = false

  # If true, the base class of anonymous controllers will be inferred
  # automatically. This will be the default behavior in future versions of
  # rspec-rails.
  config.infer_base_class_for_anonymous_controllers = false

  # rspec-rails 3 will no longer automatically infer an example group's spec type
  # from the file location. You can explicitly opt-in to the feature using this
  # config option.
  config.infer_spec_type_from_file_location!

  config.include ActiveSupport::Testing::TimeHelpers
  config.include(EmailSpec::Helpers)
  config.include(EmailSpec::Matchers)

  config.include ArchivableSpecHelper, type: :model
  config.include IssueSharedGroups,    type: :model

  config.include Devise::TestHelpers,    type: :controller
  config.include ControllerSharedGroups, type: :controller
  config.extend AuthenticationHelper,    type: :controller

  config.include Helpers::CurrentUserSupport, type: :view
  config.include ViewSharedGroups,            type: :view
  config.extend ViewMacros,                   type: :view

  config.include FeatureHelper,                        type: :feature
  config.extend Features::AuthenticationClassSupport,  type: :feature
  config.include Features::AuthenticationSupport,      type: :feature
  config.include Features::MembershipSelectionSupport, type: :feature
  config.include Features::RailsAdminArchiveSupport,   type: :feature
  config.include Features::RailsAdminDeleteSupport,    type: :feature
  config.include Features::UserSupport,                type: :feature
end

Now that each configuration has been encapsulated in its own file, it’s easy to configure individual utilities. It’s also easy to enable and disable a configuration by toggling whether the file is required.

Although it seems like an obvious change to make, a lot of projects still have cluttered rails_helper files. If you have one, I’d suggest you go ahead and take the five minutes it costs to split it up. It’ll make your life easier.

Sharing code in FactoryGirl

Posted in Code, Inside TFG, Ruby on Rails, Tips and Tricks

Sharing Code

Sharing code between factories in FactoryGirl takes a little bit of effort. Here’s one way of accomplishing it, which I patched together from reading this GitHub Issue:

Sharing code in FactoryGirl

First, we create a module with the shared code. Then we include it into the FactoryGirl::SyntaxRunner.

You need to put it in the spec/factories folder so that it is included when FactoryGirl is included. I put it in a subfolder spec/factories/support/ so that it is easy to differentiate between the factories and the supporting files.

So you might have a file like below:

# spec/factories/support/random_code_support.rb

# I use a namespace here so there's no chance of overriding any other class or module.
module FactoryGirlSupport
  module RandomCodeSupport
    def generate_random_code(i, min_length=6)
      rand(1000).to_s.rjust(min_length, "0") + i.to_s
    end
  end
end

# This will make the methods in the FactoryGirlSupport::RandomCodeSupport available in your factories
FactoryGirl::SyntaxRunner.send(:include, FactoryGirlSupport::RandomCodeSupport)

Now you can use the method in the module above in your factories like this:

FactoryGirl.define do
  factory :user do
    sequence(:remote_id) {|i| generate_random_code(i) }
  end
end

That’s all there is to it.

Refactoring rails controllers the right way

Posted in Code, Inside TFG, Ruby on Rails, Tips and Tricks

Rails developers will often live by the mantra “Skinny Controller, Fat Model” when writing Rails controller code.

At first it can seem like a fairly reasonable practice. The examples in Darcy Laycock’s SitePoint article “10 Ruby on Rails Best Practices” are a great example of this. Another good example is Jamis Buck’s 2006 article “Skinny Controller, Fat Model”.

However, the name of the pattern is misleading and reinforces the noxious idea that any class should be “fat”.

In this article I will discuss ways that we can refactor Rails controllers that will not cause bloat in any models.

Skinny Controller, Fat Model

The “Skinny Controller, Fat Model” practice often results in code in controllers being moved into a model. This reduces the complexity of the controller and moves the responsibility of managing instances of the model back into the model itself. For example, if we have the following controller:

class EventsController < ApplicationController
  before_action :authenticate_user!
  after_action :verify_authorized

  def approve
    @event = Event.find(params[:id])
    @event.state = :approved
    @event.approved_at = Time.now
    @event.approved_by = current_user
    @event.save!

    redirect_to(root_path)
  end
end

The specs for the controller might look a bit like:

require 'rails_helper'

describe EventsController do
  describe 'PATCH approve' do
    subject { patch :approve, id: event.id }

    let(:event) { FactoryGirl.create(:event, approved: false) }

    context "when authenticated && authorized to approve the Event" do
      sign_in_as(:administrator)

      describe "approving the Event" do
        let(:current_time) { Time.now }

        before { Timecop.freeze(Time.local(1990)) }
        after  { Timecop.return }

        specify { expect { subject }.to change { event.reload.approved? }.to(true) }
        specify { expect { subject }.to change { event.reload.approved_by }.to(current_user) }
        specify { expect { subject }.to change { event.reload.approved_at }.to(current_time) }
      end

      it { should redirect_to(root_path) }
    end

    context "when not authenticated to approve the Event" do
      it_behaves_like "a controller action that has failed an authentication check"
    end

    context "when not authorized to approve the Event" do
      sign_in_as(:public_user)

      it_behaves_like "a controller action that has failed an authorization check"
    end
  end
end

The “Skinny Controller, Fat Model” practice might be followed by moving the approval code into the model, leaving the controller looking something like:

class EventsController < ApplicationController
  before_action :authenticate_user!
  after_action :verify_authorized

  def approve
    @event = Event.find(params[:id])
    @event.approve!(current_user)

    redirect_to(root_path)
  end
end

The process of approving an Event has been abstracted and defined in the Event model. Not only is our controller easier to understand, but the approval process is now re-usable and our tests for the controller are easier to write. For example, we could stub out the approve! method rather than having to check all the side effects of approving an Event. Here’s what our controller specs might look like now:

require 'rails_helper'

describe EventsController do
  describe 'PATCH approve' do
    subject(:approve_event) { patch :approve, id: event.id }

    let(:event) { FactoryGirl.create(:event, approved: false) }

    context "when authenticated && authorized to approve the Event" do
      sign_in_as(:administrator)

      it "approves the Event" do
        expect(event).to receive(:approve!).with(current_user)
        approve_event
      end

      it { should redirect_to(root_path) }
    end

    context "when not authenticated to approve the Event" do
      it_behaves_like "a controller action that has failed an authentication check"
    end

    context "when not authorized to approve the Event" do
      sign_in_as(:public_user)

      it_behaves_like "a controller action that has failed an authorization check"
    end
  end
end

However, there is a cost. The Event model has grown larger and the number of responsibilities in the Event model has increased. This makes the class harder to understand, which in turn means that:

  • The risk of defects being introduced is increased
  • Refactoring the class is more difficult
  • Tests are harder to read and write. When tests are hard to write, lazy developers sometimes “forget” to write them.
  • Most importantly: The code becomes a pain in the ass to work on

To prevent bloating our model, this workflow needs to be encapsulated in another class.

Encapsulating the workflow in a model

My preference for encapsulating such a workflow is to create another class. I just think of this class as another model, but some folk prefer to call them domain objects. Here’s a decent explanation from StackOverflow.

In the example above, I’d create a model called Event::Approver to encapsulate approving an Event.

I nest Event::Approver under Event because it is a subset of the workflows in the application that are related to the Event model.

Due to the way that Rails loads files, the file will need to be in the folder app/models/event. This is convenient since I can now look in this folder to find any workflows related to the Event model.

The Event::Approver model will look like:

class Event::Approver

  attr_reader :approver, :event

  def initialize(event, approver)
    @approver = approver
    @event = event
  end

  def approve!
    event.status = :approved
    event.approved_at = Time.now
    event.approved_by = approver
    event.save!
  end

end

The specs for this model will be equally concise, looking something like:

describe Event::Approver do
  describe "#approve!" do
    subject { Event::Approver.new(event, user).approve! }

    let(:event) { FactoryGirl.create(:event, approved: false) }
    let(:user)  { FactoryGirl.create(:user) }
    let(:current_time) { Time.now }

    before { Timecop.freeze(Time.local(1990)) }
    after  { Timecop.return }

    specify { expect { subject }.to change { event.reload.approved? }.to(true) }
    specify { expect { subject }.to change { event.reload.approved_by }.to(user) }
    specify { expect { subject }.to change { event.reload.approved_at }.to(current_time) }
  end
end

The implementation of this class is not especially important. For example, some developers will prefer not to use an initializer and will pass the event and approver directly into the approve! method.

This class is easy to understand as it has a single responsibility. The class is small enough that I can read it and understand it in only a few seconds. I don’t need to search in some 800 line of code Event model for this single method.

Similarly, the tests for the class are straight forward and easy to read and understand. The test file will be quite small, as opposed to the colossal test file that exists for the Event model.

One of the less thought about features of composing classes this way is that if I need to look at the git history for this class, 100% of the commits will be related to this single piece of functionality. I recently found out that is invaluable for when you need to work out why a process has changed. Imagine sifting through all the commits on the Event model trying to work out where function X changed (yeah, I know you can use grep – but what if the method name changed? A pain in the ass, for sure).

With our new model the controller would now look slightly different than before, but is just as skinny:

class EventsController < ApplicationController
  before_action :authenticate_user!
  after_action :verify_authorized

  def approve
    @event = Event.find(params[:id])
    Event::Approver.new(@event, current_user).approve!

    redirect_to(root_path)
  end
end

In defence of “Skinny Controller, Fat Model”

To be perfectly honest, I doubt the intention of “Skinny Controller, Fat Model” is to just blindly jam all your controller code into ActiveRecord models. I suspect it’s more likely that it means “Skinny Controllers, Fat Model Layer”.

That being said – I think it is all too easy for inexperienced developers to take the phrase literally and do just that. In fact, I have six years of experience working with developers who have zealously bloated ActiveRecord models to prevent controllers from having more responsibilities. Including myself for a time.

In summary…

When it comes to refactoring, cleaning up a controller doesn’t really benefit the health of your application if it causes a model to get dirtier.

Rather than adding more and more responsibilities to ActiveRecord models, we can create new single-purpose classes (domain objects, if you like) to encapsulate these workflows.

By doing this we make smaller classes that are easier to understand, easier to maintain, and easier to test.

Further Reading:

  1. “”Fat model, skinny controller” is a load of rubbish” by Jon Cairns
  2. “Skinny Controllers, Skinny Models” by Joe Ferris

Handling race conditions with Rails’ pessimistic locking

Posted in Code, Ruby on Rails, Tips and Tricks

I recently worked on a Rails application where race conditions were causing issues in the background tasks operating on my ActiveRecord objects. In this article I will explain how I used pessimistic locking to get around these issues.

Defining an example

Firstly, consider a sample Rails application with the following features/rules:

  1. Administrator can see a list of clients;
  2. Administrator can visit the client’s profile page and send an SMS to the client;
  3. Administrator cannot send more than one SMS to each client per day;

Let’s keep the implementation as simple as possible. Consider a Client model with an attribute last_sms_sent_on and the following class (which can be used by a controller or a Sidekiq task):

class ClientSMSSender
  def self.perform(client, message)
    client.transaction do
      if client.last_sms_sent_on.blank? || !client.last_sms_sent_on.today?
        client.last_sms_sent_on = Date.today
        client.save
        SMSGateway.send(client.phone_number, message)
      end
    end
  end
end

Analysing the race condition issue

Imagine that there are some external issues and SMSGateway.send hangs for an average of 30 seconds. In this meantime another administrator makes a new request to send an SMS to the client. Due to race conditions we’ll end up sending more than one message to the client on the same day.

This is what will happen:

  1. Administrator A makes a request (Request A) to send an SMS to the client
  2. Client has not received any messages today
  3. Request A is hanging due to external issues
  4. Administrator B make a new request (Request B) to send the same SMS to the client
  5. Client has not received any messages today (the Request A still hanging)
  6. Request B is hanging due to external issues as well
  7. Request A finishes and client.last_sms_sent_on is updated
  8. Request B still holding the previous state of the client object
  9. Request B finishes, send the SMS again and re-update client.last_sms_sent_on

Work around

In order to work around that issue, you can make use of the method with_lock from ActiveRecord::Locking::Pessimistic.

Have a look at the code below:

class ClientSMSSender
  def self.perform(client, message)
    client.with_lock do
      if client.last_sms_sent_on.blank? || !client.last_sms_sent_on.today?
        client.last_sms_sent_on = Date.today
        client.save
        SMSGateway.send(client.phone_number, message)
      end
    end
  end
end

Under the hood, with_lock:

  1. opens up a database transaction
  2. reloads the record (in order to obtain the last state of the record)
  3. requests exclusive access to the record from the database

When using with_lock: 

  1. Administrator A makes a request (Request A) to send an SMS to the client
  2. Request A locks the client record in database
  3. Client has not received any messages today
  4. Request A is hanging due to external issues
  5. Administrator B make a new request (Request B) to send the same SMS to the client
  6. As the client is currently locked by Request A, the Request B hangs until the database releases the record
  7. Request A finishes and client.last_sms_sent_on is updated
  8. Database releases the client record
  9. Request B (was hanging and waiting for database) now starts the execution
  10. Request B locks the client record
  11. Request B reloads the client record in order to obtain the latest state of the object
  12. Client has already received a message today
  13. Request B finishes without sending a new SMS.

If you’re still confused about that, here’s an easy way to see how with_lock works from your Rails console:

  1. Grab any existing Rails project and open up two Rails consoles
  2. In the first console execute the following:
    u = User.first
    u.with_lock do
      u.email = "test@thefrontiergroup.com.au"
      u.save
      sleep 40 # emulates a very slow external process
    end
    
  3. In the second console execute:
    u = User.first
    u.email = "test2@thefrontiergroup.com.au"
    u.save
    

You’ll notice in the second console that the execution of u.save will hang until the first console finishes the whole process.  You should be careful not to lock your entire app unnecessarily, otherwise you’re likely to introduce a new bottleneck.

Conclusion

The method with_lock is handy, but use it sparingly. Sticking with_lock everywhere might bring you good business logic consistency, but it can come at the expense of performance.

Blockchain Analytics with Cayley DB

Posted in Code, Data Analytics, Inside TFG

Bitcoin (and consequently, the Blockchain) have been making waves in the media over the past few years.

In this blog post I will be covering the process of building relationships between blocks, transactions and addresses using Google’s Cayley DB. With this information I may be able to pinpoint important transfers of value and also build an ownership database on top of it to track high value individuals.

I’m going to call this project Bayley, an amalgamation of “Bitcoin” and “Cayley”. I never was that creative.

The Blockchain

So what are the advantages of the Blockchain? What’s so special about it? For starters, for the first time in history we have a database that:

  • Can’t easily be rewritten
  • Eliminates the need for trust
  • Resists censorship
  • Is widely distributed.

A lot of people like to call it a distributed ledger. This is true if you use Bitcoin as a currency. The Blockchain however, has the capability for much more. As a new and possibly disruptive technology I figured it would be a good idea to learn more about it. In the process we also might glean enough of its processes for building unique services on top of the Blockchain.

The Database

I originally tried to work with this project using MongoDB. I ended up shelving the idea as MongoDB is not suitable for this task. The schema is consistent across blocks and I need to be able to easily find relationships between datapoints.

I had a look at LevelGraph and Neo4j but in the end decided to go with Cayley. Cayley has been explored previously by The Frontier Group, and is a very new technology and I wanted to learn how to use it.

Setup Considerations

The first step will be to synchronise a copy of the blockchain locally for your use. I used the testnet instead of mainnet for testing purposes.

Originally I used BTCD as I wanted a server-side, headless daemon. Bitcoin Core can do this, but not in OSX. I constantly ran into bugs and inconsistencies such as:

  • RPC setup using a different variable names making existing libraries that hook into Bitcoin Core useless
  • JSON batch requests not supported

In the end I just opted to run an instance (with GUI and all) of Bitcoin Core on my machine. Get it here!

Before starting to synchronise the Blockchain it might be useful to note that transaction data is not saved in the local blockchain to conserve disk space. Transaction indexing can be turned on with the commandline switch -txindex or adding the line txindex=1 to your bitcoin.conf.

RPC needs to be enabled. Using RPC calls to the Bitcoin daemon will allow you to pull out the block data.

Spinup instructions

Overview of Process

From a high level, the process will look like this:

  • Get block hashes from height
    • Get blocks from block hashes
  • Send an HTTP POST request to Cayley DB of the above data

This does not take into account transaction data either. That will be a topic for a future blog post. So lets get started!

Setting up Bitcoin Core

The Bitcoin Core standard conf file has a lot of stuff in there, but in general you’ll need to make sure the following lines are as follows:

txindex=1
testnet=1
server=1
rpcuser=bitcoinrpc
rpcpassword=BHaVEDoMkVr1xKudcLpVbGi2ctNJsseYrsuDufZxwEXb
rpcport=8332

The rpcpassword is autogenerated by Bitcoin Core. You can use an environment variable if you’re concerned about security and such. Since this project is just for testing purposes and the password is randomised, I’m not too bothered that its sitting there in plaintext.

Block Extraction

We’ll be using Peter Todd’s python-bitcoinlib library. The pprint library is also used printing to console for quick and dirty debugging purposes. Install these using Pycharm, then add to the top of your bayley.py file:

import bitcoin
import bitcoin.rpc
from pprint import pprint as pp

The next step will be to write some simple code to extract some blocks.

def main():
    # This will create a batch of commands that requests the first 100 blocks of the blockchain
    commands = [ {"method": "getblockhash", "params": [height]} for height in range(0, 100 ]
    # Connect to the RPC server, send the commands and assign to the results variable
    conn = bitcoin.rpc.RawProxy()
    results = conn._batch(commands)</p>
    # Extract the hashes out of the result
    block_hashes = [res['result'] for res in results]</p>
    # Prepare to extract specific block data
    blocks = []
    for hash in block_hashes:
        blocks.append(conn.getblock(hash))</p>
    # Call the function to make the triples to prepare for importing to CayleyDB
    block_triples = make_triples_for_block(blocks)

Block Structure

Here is an example of a single block’s data:

{'bits': '1d00ffff',
 'chainwork': '0000000000000000000000000000000000000000000000041720ccb238ec2d24',
 'confirmations': 1,
 'difficulty': Decimal('1.00000000'),
 'hash': '0000000084ee00066214772c973896dcb65946d390f64e5d14a1d38dfa2e4d90',
 'height': 445610,
 'merkleroot': 'eaf042fa845ea92aba661632bc6b8e78e8e64c2917a92f1a7da0800ed793b819',
 'nonce': 1413010373,
 'previousblockhash': '0000000087a272f48c3785de422e232c0771e2120c8fdd741a19ea98d122132b',
 'size': 315,
 'time': 1432705094,
 'tx': ['eaf042fa845ea92aba661632bc6b8e78e8e64c2917a92f1a7da0800ed793b819'],
 'version': 3}

With this in mind we can begin working on pulling the data from the blockchain and parsing the specific blocks.

Making Triples

Cayley uses the subject, predicate, object system, known as a triplestore. We need to parse the block data from the previous section into this triplestore format.

One of the limitations of the triplestore is that you can not add much metadata to each node. Array indexing and similar are a problem in this regard. In this case we will use the blockhash as the subject for all block data, the key value for all predicates, and the block data (excluding the block hash) as the object variable.

Lets create a function that does this:
At the top of my bayley.py file I will create a global variable which specifies which key value pairs for which I want to create a triplestore.

DESIRED_BLOCK_KEYS = ("height", "nextblockhash", "previousblockhash", "size", "time", "difficulty")

Next I wish to declare the function:

def make_triples_for_block(blocks):
    triples = []

We will next need to iterate through the blocks and their respective keys to start pulling the relevant data. The first thing to do is to ignore the blockhash key:

def make_triples_for_block(blocks):
    triples = []
    for block in blocks:
        for key in block:
            # Ignore self reference
            if (key == "hash"):
                continue

The transactions value is an array so its best to iterate through these separately.

def make_triples_for_block(blocks):
    triples = []
    for block in blocks:
        for key in block:
            # Ignore self reference
            if (key == "hash"):
                continue
            # Iterate through transactions
            if (key == "tx"):
                for t in block[key]:
                    triples.append({
                        "subject": block['hash'],
                        "predicate": key,
                        "object": t
                    })

And finally we can now append our block data to the triples array we declared in the beginning. Note how I casted the values to strings, this was to prevent an issue later on when you want to import into CayleyDB. Cayley is happiest when you give her JSON files that are all strings.

def make_triples_for_block(blocks):
    triples = []
    for block in blocks:
        for key in block:
            # Ignore self reference
            if (key == "hash"):
                continue
            # Iterate through transactions
            if (key == "tx"):
                for t in block[key]:
                    triples.append({
                        "subject": block['hash'],
                        "predicate": key,
                        "object": t
                    })
            # Iterate through first level block data
            if (key in DESIRED_BLOCK_KEYS):
                triples.append({
                    "subject": str(block['hash']),
                    "predicate": key,
                    "object": str(block[key])
                })
    return triples

So now we have a triples variable returned which contains all of our triples ready for importing!

Here is an example of the triples for a single block for your reference:

[{'object': '1',
  'predicate': 'height',
  'subject': '00000000b873e79784647a6c82962c70d228557d24a747ea4d1b8bbe878e1206'},
 {'object': '190',
  'predicate': 'size',
  'subject': '00000000b873e79784647a6c82962c70d228557d24a747ea4d1b8bbe878e1206'},
 {'object': 'f0315ffc38709d70ad5647e22048358dd3745f3ce3874223c80a7c92fab0c8ba',
  'predicate': 'tx',
  'subject': '00000000b873e79784647a6c82962c70d228557d24a747ea4d1b8bbe878e1206'},
 {'object': '000000006c02c8ea6e4ff69651f7fcde348fb9d557a06e6957b65552002a7820',
  'predicate': 'nextblockhash',
  'subject': '00000000b873e79784647a6c82962c70d228557d24a747ea4d1b8bbe878e1206'},
 {'object': '1.00000000',
  'predicate': 'difficulty',
  'subject': '00000000b873e79784647a6c82962c70d228557d24a747ea4d1b8bbe878e1206'},
 {'object': '000000000933ea01ad0ee984209779baaec3ced90fa3f408719526f8d77f4943',
  'predicate': 'previousblockhash',
  'subject': '00000000b873e79784647a6c82962c70d228557d24a747ea4d1b8bbe878e1206'},
 {'object': '1296688928',
  'predicate': 'time',
  'subject': '00000000b873e79784647a6c82962c70d228557d24a747ea4d1b8bbe878e1206'}]

Setting up Cayley

Cayley uses golang. A packaged binary is available (so you shouldn’t need to setup golang separately) from here.

This is my Cayley config file:

{
"database": "bolt",
"db_path": "./blockchain",
"read_only": false,
"replication_options": {
  "ignore_duplicate": true,
  "ignore_missing": true
}
}

I’m using bolt db over leveldb because bolt is slightly better for high reads. You can read more here.

After making the cayley.cfg file, initialise the database by running the init command like so (from the Cayley folder):

./cayley init -config cayley.cfg

This will create a blockchain file and prep the backend database for Cayley goodness. The next step will be to run the HTTP server:

./cayley http -config cayley.cfg

Now we’re ready to send all the data in!

Sending to Cayley

Cayley’s HTTP documentation will help with this section. It receives JSON triples in the form of the following:

[{
    "subject": "Subject Node",
    "predicate": "Predicate Node",
    "object": "Object node",
    "label": "Label node"  // Optional
}]   // More than one quad allowed.

We’ll need to POST this data to our Cayley server’s write API via http://localhost:64210/api/v1/write.

Now we need to make use of the excellent requests python library. Install it in Pycharm then add the following to the top of the bayley.py file. Cayley is expecting a json file so we’ll also need to install and import that.

You’ll also want to put in a global variable there for Cayley’s URL and also tell Cayley that we’re sending a JSON file.

import requests
import json
DB_WRITE_URL = "http://127.0.0.1:64210/api/v1/write"
DB_WRITE_HEADERS = {'Content-type': 'application/json'}

We’re going to create a function to send the data over to Cayley. Note how the data is converted to json in the data= argument.

def send_data(data):
    r = requests.post(DB_WRITE_URL, data=json.dumps(data), headers=DB_WRITE_HEADERS)
    pp(r)
    pp(r.text)

If the pp(r) prints out a response of 200 then we’re good! If not then we’ll need to look at what went wrong which is usually explained well in the r.text variable. This is the result I got:

<Response [200]>
'{"result": "Successfully wrote 693 quads."}'

Go back to your main function and call the send_data function:

def main():
    ...
    send_data(block_triples)
    ...

And that should do it.

Graphing the result

By now we should have 100 blocks in Cayley! Head over to http://localhost:64210 and lets start graphing!

In the query page we can test out our queries. I wrote a simple one that loops through the first 5 blocks, gets all objects that are one edge away (Out()) and gets the result:

for(var i=0; i<5; i++){
    g.V().Has("height", String(i)).Tag("source").Out().Tag("target").All();
}

Here is the result of the first block:

{
 "result": [
  {
   "id": "4a5e1e4baab89f3a32518a88c31bc87f618f76673e2cc77ab2127b7afdeda33b",
   "source": "000000000933ea01ad0ee984209779baaec3ced90fa3f408719526f8d77f4943",
   "target": "4a5e1e4baab89f3a32518a88c31bc87f618f76673e2cc77ab2127b7afdeda33b"
  },
  {
   "id": "1296688602",
   "source": "000000000933ea01ad0ee984209779baaec3ced90fa3f408719526f8d77f4943",
   "target": "1296688602"
  },
  {
   "id": "0",
   "source": "000000000933ea01ad0ee984209779baaec3ced90fa3f408719526f8d77f4943",
   "target": "0"
  },
  {
   "id": "285",
   "source": "000000000933ea01ad0ee984209779baaec3ced90fa3f408719526f8d77f4943",
   "target": "285"
  },
  {
   "id": "1.00000000",
   "source": "000000000933ea01ad0ee984209779baaec3ced90fa3f408719526f8d77f4943",
   "target": "1.00000000"
  },
  {
   "id": "00000000b873e79784647a6c82962c70d228557d24a747ea4d1b8bbe878e1206",
   "source": "000000000933ea01ad0ee984209779baaec3ced90fa3f408719526f8d77f4943",
   "target": "00000000b873e79784647a6c82962c70d228557d24a747ea4d1b8bbe878e1206"
  }
 ]
}

The query shape looks like this:

queryshape

The visualisation itself looks like these following images.

Single block:
oneblock

Five blocks:
fiveblocks

As you can see there are shared nodes, this is because the nodes have the same predicate and objects, but different subject (blockhash). This is a good example of how cayley helps in visualising relationships.

One hundred blocks:

hundredblocks

The shared nodes here are due to the common block size and block difficulty (the latter changes every 2 weeks). You can see a close up below:

zoom

Conclusion

This is just the early stage. The next step will be to parse the transactions for Bitcoin addresses and start drawing all the relationships between them. Once a strong system is in place for parsing the block chain, you might want to begin parsing the other 400,000 blocks or so, and also switch to the mainnet. Web scraping usernames for addresses and also estimating relationships based on round number transferring of value also is in the pipeline.

Search Posts

Featured Posts

Categories

Archives

View more archives