As always, I am a little late, but I want to jump on the bandwagon and mention the recent MySQL Cluster milestone of passing 1 billion queries per minute. Apart from echoing the arbitrarily large ransom demand of Dr Evil, what does this mean?
Obviously 1 billion is only of interest to us humans as we generally happen to have 10 fingers, and seem to name multiples in steps of 10^3 for some reason. Each processor involved in this benchmark is clocked at several billion cycles per second, so a single billion is not so vast or fast.
Measuring over a minute also feels unnatural for a computer performance benchmark - we are used to lots of things happening every second! A minute is a long time in silicon.
What's more, these reads are served from tables stored entirely in memory - and everyone knows that main memory is infinitely fast and scalable and always getting cheaper, right?
If we convert to seconds we are left with only 17 million reads per second! Hardly worth getting out of bed for?
On the contrary, I think that achieving 17 million independent random reads per second, each read returning 100 bytes across a network, from a database that also supports arbitrary SQL, row locking, transactions, high availability and all sorts of other stuff, is pretty cool. I doubt that (m)any other similar databases can match this raw performance, though I look forward to being proved wrong.
(Also, don't forget to meet + beat 1.9 million random updates/s, synchronously replicated)
Raw performance is good, but not everyone just needs horsepower. The parallel, independent work on improving join performance (also known as SPJ/AQL) and query optimisation helps more applications harness this power, by improving the efficiency of joins.
I wrote a post about SPJ/AQL at the start of last year, when it was still in the early stages. Since then much has improved, to the extent that the performance improvement factors have become embarrassingly high on real user queries. A further post on the technical details of SPJ/AQL is long overdue... Perhaps the most interesting details are on the integration between the parallel, streaming linked operations and the essentially serialised MySQL Nested Loops join executor. A linked scan and lookup operation can be considered to be a form of parallel hash join, which the normal MySQL NLJ executor can invoke as part of executing a query. Who says Nested Loop joins can't scale?
Tuesday, 21 February 2012
One billion
Labels:
cluster,
design,
distributed-systems,
latency-hiding,
mysql,
nosql,
parallel,
rambling
Friday, 17 February 2012
Transactional memory in 2012
I've been observing the appearance of hardware transactional memory (HTM) systems in the wild with interest. Keen readers might recall posts recalling my work on software for the XA-Core HTM system at Nortel.
Two things have seemed rather opaque in most coverage of transactional memory systems. The first is when and why it is better than using explicit locking / atomic operations. The second is the time+space properties of actual implementations. Too many expositions are still bound up in the simplicity of the interface to discuss the real benefits and drawbacks. Everybody likes a simplifying abstraction, but not if it is slower or has unpredictable side-effects.
So it was refreshing to read a blog post by Greg Pfister (formerly of IBM), describing an HTM implementation in relative laymans terms. I first read Greg's book 'In Search of Clusters' around the same time I was working on XA-Core (~2000), so Greg has quite some background context. He does not pretend to fully comprehend the implementation, but he asks the right questions. Searching more widely, I came across a discussion of the forthcoming Intel 'Haswell' chip at LWN. The comments here give some insight into the implementation and implications. Hopefully we'll start to hear more about the physical properties of these mechanisms, their sweet spots and limitations.
The transactional memory concept is unusual in that it seems to have proponents both at the chip level (Azul, Sun's failed Rock SPARC CPU, IBM, Intel) and in the functional language community, most notably Haskell. I suspect the functional language community is motivated by the simplicity of the concurrency abstraction, and the chip community are motivated by the transistor use case. There doesn't seem to be the same demand for it from the vast middle ground of OSs, middleware, applications etc. Does this signify something?
Two things have seemed rather opaque in most coverage of transactional memory systems. The first is when and why it is better than using explicit locking / atomic operations. The second is the time+space properties of actual implementations. Too many expositions are still bound up in the simplicity of the interface to discuss the real benefits and drawbacks. Everybody likes a simplifying abstraction, but not if it is slower or has unpredictable side-effects.
So it was refreshing to read a blog post by Greg Pfister (formerly of IBM), describing an HTM implementation in relative laymans terms. I first read Greg's book 'In Search of Clusters' around the same time I was working on XA-Core (~2000), so Greg has quite some background context. He does not pretend to fully comprehend the implementation, but he asks the right questions. Searching more widely, I came across a discussion of the forthcoming Intel 'Haswell' chip at LWN. The comments here give some insight into the implementation and implications. Hopefully we'll start to hear more about the physical properties of these mechanisms, their sweet spots and limitations.
Subscribe to:
Posts (Atom)