This talk will focus on the design considerations and architecture of Druid, an open-source, distributed, column-oriented analytical data store. Druid is an open source distributed system in use at Metamarkets (http://www.metamarkets.com) to facilitate rapid exploration of high dimensional spaces. We use Druid to expose impression monetization data to ad tech companies along any arbitrary combination of demographic, content and sales-based dimensions. One Druid cluster currently exposes a data set of >40 billion rows of data representing >2 trillion impressions in hypercubes of varying dimensionality (largest is 30+ dimensions) while allowing for exploration using top lists and timeseries in sub-second latencies. There will be a particular focus on how Druid can be used to ingest data in real-time on the write side and provide real-time access to data on the read side.
The Druid code can be found at http://www.github.com/metamx/druid.
presentation