Thursday, 25 March 2010

ACID tradeoffs, modularity, plugins, Drizzle

Most software people are aware of the ACID acronym coined by Jim Gray. With the growth of the web and open source, the scaling and complexity constraints imposed on DBMS implementations supporting ACID are more visible, and new (or at least new terms for known) compromises and tradeoffs are being discussed widely. The better known NoSQL systems are giving insight by example into particular choices of tradeoffs.

Working at MySQL, I have often been surprised at the variety of potential alternatives when implementing a DBMS, and the number of applications which don't need the full set of ACID letters in the strictest form. The original MySQL storage engine, MyISAM is one of the first and most successful examples of an 'ACID remix'. The people drawn to DBMS development work often have a perfectionist streak, which can cause them to tend to prefer 'nothing' over 'imperfect'. MyISAM was and still is a flag-bearer for 'good enough'. Perhaps we should be less modest and call it 'more than good enough'.

One seldom discussed benefit of MySQL's storage engine architecture is that pressure to make 'The One True Storage Engine' is reduced. DBMS products with one fixed database engine need to optimise for all supported use cases. This is a great engineering challenge, but increases design effort, requirements for configuration and auto-tuning, constraints on any design change or reoptimisation etc. With MySQL, there are multiple existing storage engines, each with a (sub)set of target use-cases in mind. A single MySQL server can maintain and access tables in different storage engines, each tuned as closely as possible to the use-case for the data, without adding complexity to unrelated engines. Engines can be wildly optimised for a narrow use case as there are plausible alternative engines available for other use cases.

I understand that one aim of the Drizzle project is to extend the modularity of the MySQL Server on multiple axes, allowing diversity to flourish. As a one-time Java coder, who enjoyed the pleasures of design-by-interface, I can see the attraction. While the effort is guided by an actual need for modularity and real examples of alternative plugins, it can be a great force multiplier. There is always the risk of modularity for its own sake - a branch of Architecture Astronautics. Sure symptoms, which I may have suffered from in the past, include the class names FactoryFactory..., PolicyPolicy, or [Anything]Broker).

Another good vibe from Drizzle is the microkernel concept, although would say that there's some terminological abuse occurring here! Perhaps it could more reasonably be said that MySQL has a TeraKernel and Drizzle has a MegaKernel? In any case the motivations are good. Decoupling the huge chunks of functionality glued together inside MySQLD is great for long term software integrity, understanding dependencies, finding (and introducing) bugs, and might make it easier to start adding functionality again. Replication seems especially ripe ground for alternative plugins. User authentication is another often requested 'chunk'. It will take longer to crystalise interfaces for more deeply embedded areas like the query Optimizer/Executor, but if these interfaces are arising from a real need then that can drive the API design.

One aspect of storage engine modularity that is not often mentioned is that some MySQL storage engines also moonlight with other products. The Berkeley database (BDB) is probably the oldest and most promiscuous, embedded in DNS daemons, LDAP servers and all sorts of other places. Ndb is unusual in that it can be used from separate MySQLD and other NdbApi processes at the same time. InnoDB has also recently added an embedded variant. This trend will accelerate, especially when some of the distributed NoSQL systems start supporting 'pluggable local storage' APIs. I imagine that a NoSQL local storage engine API could be somewhat simpler to implement than the MySQL SE API, at least to start with!