Monitoring: update
2015-03-13
Prometheus appears to actually exist now, and is coming along pretty well.
https://github.com/prometheus/prometheus
Glancing through it, it appears to hit the bulk of my points. It's still lacking a permanent DB for storage, so it's probably not deployable for most folks yet, but in general it looks like they are doing things basically right In particular.
So, there's no answer out there that I think solves the poblem *yet*, but at least someone is getting close. Check out the query language, particularly operators:
http://prometheus.io/docs/querying/basics/
A friend of mine also works at splunk, and they showed me some of splunk's features. As a backing store it appears to fit much of what I was describing as well. It has reasonable group-by semantics for example. Fundamentally splunk is basically a timeseries DB, it just happens to most often be used like ES is for processing logs.
So, with any luck my old monitoring posts will soon become defunct. Here's hoping!