The Frontier Group – Official Red Hat partners

Posted in Inside TFG, Insight, TIBCO

20130513144223!Red_hat_logo

The Frontier Group has been a long time user of Linux environments and it is an exciting move for us to become official Red Hat partners.

Recently The Frontier Group has been building up relationships with companies that can improve our web, mobile and analytics capabilities. Our relationship with Tibco has increased our offering in the integration and analytics space and this new alliance with Red Hat will bring increased options for deployment and hosting.

Red Hat is the world’s best success story for Open Source software in the enterprise and this story is constantly getting better through diverse community support and growth. Their offering is extensive and we are really excited to see their increasing presence in virtualisation, cloud configurations and next generation storage.

They have demonstrated recently that not only will they remain a strong contender in the virtualization space through support of Docker and the Atomic project but the very popular topic of big data has been getting their support through improvements relating to Hadoop and OpenShift.

If you are interested to find more about Red Hat and what they offer beyond the well known enterprise Linux distribution, get in touch with us.

 

Tibco NOW!

Posted in Insight, TIBCO

Tibco_now_logo

The Tibco Now event kicks off on on Monday Nov 3rd and in the words of the events home page:

TIBCO NOW will provide an opportunity to learn about the seismic technological forces causing this disruption – Big Data, Cloud, Mobile and Social.

The Frontier Group is sending over our CIO and company founder Matt Lambie to check it out and add some Australian views to the hallway discussions. We are really excited to see what the conference brings and if you are interested in meeting Matt or talking over new ideas in The Frontier Group + Tibco space, get in touch and let us know.

 

A quick tip with .Net generics

Posted in Code, Inside TFG

Generics constraints in .Net can be recursive. This means that you can use a type in its own generic constraints. Let’s look at an example of where this can be useful.

Let’s say you have some kind of persistent object IEntity. To avoid primitive obsession we are going to create a type safe Reference<T> object to act as a pointer to our entities, rather than just having a property of int called Id.

IReference<TEntity>
where TEntity : IEntity
{
// Actual interface doesn't matter
}

We want a base entity to inherit from, which among other things exposes an IReference<T> to itself.  We can’t be much more specific than returning an IReference<EntityBase>, since we can’t know the subclass type at compile time. Unless we hail to the generic recursion gods.

EntityBase<TSelf> : IEntity
where TSelf : EntityBase<TSelf>
{
IReference<TSelf> Reference { get { ... } };
}

Now we just supply the type when we declare our subclass:

MyEntity : EntityBase<MyEntity>
{
}

You can do much the same thing in Java, but it’s not quite as safe since MyEntity extends EntityBase<OtherEntity> will compile just fine.

As an exercise to the reader; consider the visitor pattern, where we implement a virtual Accept method in order to have compile time type knowledge of this. Can you now write a non virtual Accept method?

Ada Lovelace

Posted in Inside TFG

Today Oct 14th is Ada Lovelace Day.

440px-Ada_Lovelace_portrait

Ada Lovelace, an English mathematician born in 1815 is often described as the world’s first computer programmer. Through her accomplishments in a male dominated era, Ada has become a powerful symbol for women and their success in the modern world of STEM fields. This day is a celebration of challenging mindsets that may be resistant to the idea of equal opportunity and capability.

Matt Lambie (co-founder of TFG) raised our awareness of Ada Lovelace Day and mentioned had his son actually been a girl, Ada was a name he was keen on specifically because of Ada Lovelace. Through discussion he also pointed me further to a talk by Ashe Dryden discussing Programming Diversity. This talk touched on many aspects of diversity including women and their role in STEM fields. A point raised by Ashe that resonates with Matt and I is that diversity can have a positive impact on the workplace in numerous ways. For us a diversity in cultural backgrounds, skills, interests and genders makes for a great team dynamic and a powerful problem solving crew. More importantly we have found diversity as a side effect of our search for skilled designers and developers.

Through Ashe’s talk I have been exposed to a more complicated world and one with some concerning numbers. As the father of a young girl and boy I hope that as they reach education and employment, they are driven purely by their interests and abilities, not social pressures.

A look at Cayley

Posted in Code, Inside TFG

Recently I took the time to check out Cayley, a graph database written in Go that’s been getting some good attention.

Cayley Logo

https://github.com/google/cayley

From the Github:

Cayley is an open-source graph inspired by the graph database behind Freebase and Google’s Knowledge Graph.

Also to get the project owners disclaimer out of the way:

Not a Google project, but created and maintained by a Googler, with permission from and assignment to Google, under the Apache License, version 2.0.

As a personal disclaimer, I’m not a trained mathematician and my interest comes from a love of exploring data. Feel free to correct me if something should be better said.

I’ve seen Neo4j.. I know GraphDB’s

Many people exploring graph databases start with Neo4j and conceptually it’s similar but in usage terms there is a bit of a gap.

Neo4j has the Cyper query language which I find very expressive but also more like SQL in how it works. Cayley uses a Gremlin inspired query language wrapped in JavaScript. The more you use it the more it feels like writing code based interactions with chained method calls. The docs for this interface take some rereading and it was only through some experimentation that I started to see how it all worked. They can be accessed via the Github docs folder. I worked my way through the test cases for some further ideas.

Another major difference is that in Neo4j it’s a bit of a gentler transition from relational databases.  With Neo4j you can group properties on nodes and edges so that as you pull back nodes it feels a little more like hitting a row in a table. Cayley, however, is a triple / quad store based system so everything is treated as a node or vertex. You store only single pieces of related data (only strings in fact) and a collection of properties that would traditionally make up a row or object is built through relationships. This feels extreme at first as to get one row like object you need multiple traversals but over time for me it changed how I looked at data.

unnamed0_-_yEd

As an example (ignoring the major power of graph databases for starters) we might have the question “What is user 123’s height”. In Neo4j we can find a person with id 123, pulling back a node with that person’s name and height. We can then extract the height value. In Cayley you could find the persons id node and then move via the height relationship to the value 184. So in the first case we are plucking property data from a returned node. In the second we collect the information we want to return. This is more a conceptual difference than a pro or a con but it becomes a very clear difference when you start to import data via quad files.

What is an  n-quad?

As mentioned Cayley works on quads / triples which are a simple line of content describing a start, relationship and finish. This can be imagined as two nodes joined by an edge line. What those nodes and relationships are can be many things. Some people have schemas or conventions for how things are named. Some people are using URLs to link web based data. There is a standard that can be read about at www.w3.org:

http://www.w3.org/TR/n-quads/

A simple example might be from the above:

"/user/123" "named" "john" .
"/user/124" "named" "kelly" .
"/user/124" "follows" "/user/123" .

When is a database many databases?

One of the tricky parts of a graph database is how to store things. Many of the graph dbs out there don’t actually store the data but rather sit on an existing database infrastructure and work with information in memory. Cayley is no different as you can layer it upon a few different database types – LevelDB, Bolt, MongoDB and an in memory version.

An interesting part of this is the vague promise of scaling. Most graph databases start off the conversation with node traversal, performance, syntax but they almost all end in scaling. I think Cayley is now entering this territory. As it moves from a proof of concept to something that gets used more heavily, it’s acquiring backends that can scale and the concept of layering more than one Cayley instance in front of that storage layer.

One think to keep in mind is performance is a combination of how the information stored and accessed so put a fast graph db in front of a slow database and you’ll average out a little in speed. For my testing I used a built in leveldb store as it is built in and easy to get started with.

Show me the graph!

One of the first issues I had with Cayley was not 100% knowing how to get graph to page. Neo4j spin up was a little clearer and error handling is quite visual. Cayley you have to get syntax and capitalisation just right for things to play nicely.

Lets assume you have the following graph:

graphy

Node A is connected out to B,C and D. This can be described in a n-quads file as:

"a" "follows" "b" .
"a" "follows" "c" .
"a" "follows" "d" .

If we bring up the web view using a file with that content we can query:

g.V('a').As('source').Out('follows').As('target').All()

Running it as a query should give you some json:

{
  "result": [
    {
      "id": "b",
      "source": "a",
      "target": "b"
    },
    {
      "id": "c",
      "source": "a",
      "target": "c"
    },
    {
      "id": "d",
      "source": "a",
      "target": "d"
    }
  ]
}

Swap to the graph view, run it again and you should see a graph. Not all that pretty but it’s a start.

Cayley

So what’s happening here? Starting at ‘A’ and calling it “source” we traverse joins named “follows” that go out from A and take note of the end node calling it “target”. Be aware that the source / target is case sensitive and if you get it wrong you won’t see anything. When I say “calling” what I mean is that as the nodes are being traversed it will “emit” the value found with the name provided as the key. This is building up the JSON objects with each traversal as a new object in the returned list.

Doing more

So now we have the basics and that’s as far as a lot of the examples go. Taking things a little further.

I recently read an article 56 Experts reveal 3 beloved front-end development tools and in doing so I came across entry after entry of tools and experts. My first reflex was where are the intersections and which are the outliers.  So I decided to use this as a datasource. I pulled each entry into a spread sheet and then ran a little script over it to produce the quads file with:

"<person>" "website" "<url>" .
"<person>" "uses" "<tool name>" .

and for each first mention of a tool:

"<tool>" "website" "<url>" .

The results was a 272 line quads file with people, the software they used and the urls for the software.

From there I started Cayley with the usual command:

cayley http --dbpath=userreviews.nq

So what next? We can find a product and see who is using it:

g.Emit(g.V('sublime text').In('uses').ToArray())

Which results in:

{
 "result": [
  [
   "stevan Živadinovic",
   "bradley neuberg",
   "sindre sorus",
   "matthew lein",
   "jeff geerling",
   "nathan smith",
   "adham dannaway",
   "cody lindley",
   "josh emerson",
   "remy sharp",
   "daniel howells",
   "wes bos",
   "christian heilmann",
   "rey bango",
   "joe casabona",
   "jenna gengler",
   "ryan olson",
   "rachel nabors",
   "rembrand le compte"
  ]
 ]
}

Note I used the specific emit of the array values to avoid a lengthy hash output.

Sure that’s interesting but how about we build a recommendation engine?

Say you are a user that is a fan of SASS and Sublime Text. What are some other tools experts using these tools like?

// paths that lead to users of the tools
var a = g.V('sass').In('uses')
var b = g.V('sublime text').In('uses')

// Who uses both tools
var c = a.Intersect(b).ToArray()

// What tools are used by all of those people
var software = g.V.apply(this, c).Out('uses').ToArray()

// Convert an array to a hash with counts
var results = {}
_.each(software, function(s){
  if(results[s]==null){ results[s]=0; }
  results[s] +=1;
})

// Remove search terms
delete results['sass']
delete results['sublime text']

// Emit results
g.Emit({tools: results, users: c})

Here we are:

  1. finding the people that use sass and sublime text
  2. finding all the tools they use
  3. counting the number of times a tool appears
  4. removing our search tools
  5. emitting the results as the response

This gives us:

{
 "result": [
  {
   "tools": {
    "angularjs": 1,
    "chrome dev tools": 5,
    "jekyll": 1,
    "jquery": 1
   },
   "users": [
    "bradley neuberg",
    "nathan smith",
    "adham dannaway",
    "wes bos",
    "joe casabona",
    "jenna gengler",
    "ryan olson",
    "rachel nabors"
   ]
  }
 ]
}

Note how Cayley is pretty happy for us to move in and out of JavaScript and that underscore.js is available by default. Handy. Also I returned a custom result construction with both the results hash and the users it was derived from.

So this isn’t necessarily the most efficient way of doing things but it’s pretty easy to follow.

I think for many, the fact that Cayley uses a JavaScript based environment will make it quite accessible compared to the other platforms. I hope to keep exploring Cayley in future articles.

Search Posts

Featured Posts

Categories

Archives

View more archives