On the page linked above, Shard-Query is described using the following statements :
"Shard-Query is a distributed parallel query engine for MySQL"
"ShardQuery is a PHP class which is intended to make working with a partitioned dataset easier"
"ParallelPipelining - MPP distributed query engines runs fragments of queries in parallel, combining the results at the end. Like map/reduce except it speaks SQL directly."The things I like from the above description :
MySQL Proxy could do something along these lines then the language debate would be moot.
I am likely to fall foul of the lack-of-original-content test if I quote too much from the Shard-Query website, but the How-it-works section seems relevant here.
How it works
- The query is parsed using http://code.google.com/p/php-sql-parser
- A modified version of the query is executed on each shard.
- The queries are executed in parallel using http://gearman.org
- The results from each shard are combined together
- A version of the original query is then executed over the combined results
- All aggregation is done on the slaves (pushed down)
- Queries with inlists can be made into parallel queries.
- A callback can be used for QueryRouting. You provide a partition column, and a callback which returns information pointing to the correct shard. The most convenient way to do this is with Shard-Key-Mapper
Query rewriting rules
The core of Shard-Query are the query rewriting rules, which Justin introduces in a blog post entitled
- MySQL SQL Apis (PHP, JDBC, ODBC, Ruby, Python, ....)
- NoSQL access mechanisms
- ShardQuery for SQL reporting / analysis
This combination of scalability, efficiency and SQL query-ability could be a sweet spot in the increasingly confusing multi-dimensional space of high throughput distributed databases.