Thursday, 6 March 2014

Reports exaggerated

I've been letting the blog rest recently, and not so recently as well.  The problem is not a lack of subjects, but a lack of time to do them any justice.  However it is quite sad to see that my last entry was in September 2012, so it is time to post again.

Of late I have been pondering what I have to say about :
  • Distributed MVCC and write-scaling
  • Different approaches to eventual consistency with replicated RDBMS
  • Various MySQL Cluster related topics
  • Various general rambling and unstructured topics
However, these will take some time to percolate and calcify.

In the meantime here are some things I have found interesting recently :
  • Learn You Some Erlang for Great Good
    I actually rediscovered this online book after watching some Joe Armstrong + Erlang videos, after watching some spoof video about bringing Erlang up to date.  All recommended  Erlang and Ndb Cluster share some Plex heritage, which can still be seen in their architectures today.  Since Plex, Erlang has mated with Prolog, and Ndb Cluster was involved in a car crash with C++.
  • HyperDex + Hyperdex Warp
    Something I discovered last year from Emin Gün Sirer's blog and have returned to since.  There are a number of nice ideas combined here (chain replication, value dependent chaining, hyperspace hashing, subspaces).  My favourites are the concept of 'spurious coordination' and their solution w.r.t. transaction consistency : ordering the route of the optimistic 'distributed commit' based on the affected keys.  I guess we need more independent analysis and evaluation to understand the strengths and weaknesses of these techniques.
  • Kronos
    This is a distributed HA 'event ordering system' from the same Cornell HyperDex team.  Thinking about distributed MVCC led me to thinking about efficiently maintaining a distributed partial ordering of events while avoiding 'spurious coordination'.  Kronos is an attempt to solve part of that problem in a kind of abstracted SOA way.  There is some nice detail in the paper about their dependency graph traversal optimisations, and how dependencies are immutable once discovered, so can be cached, replicated for read scale-out etc.  This could be a great systems building block.
  • Systems Performance book
    I am slowly reading this doorstop book from Brendan Gregg, an ex Solaris kernel engineer at Sun, now at Joyent.  It contains a great amount of recent practical information about Linux + Solaris performance analysis and optimisation.  Unix performance tools have always been a little opaque to me, with very little of how-to-approach a performance problem ever being documented.  This book covers many old and new tools, but also includes rare information on how to analyse problems with these tools, rather than just syntax and units of values returned.  Perhaps even better is his supreme confidence about tackling and solving any performance problem that foolishly catches his eye.  I guess that comes from experience, but maybe a little can be conveyed to his readers by this book.
  • Google Spanner, Galera Cluster, MoSQL, RAMCloud, NuoDB, OpenReplica
    Different approaches, ideas, hidden tradeoffs, strengths and weaknesses!
One of my favourite discoveries was this quote attributed to Charles Babbage :
"On two occasions I have been asked 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?'
I am not able rightly to comprehend the kind of confusion of ideas that could provoke such a question." 
Strange how often this response has been on my lips since !