On December 11, 2012 I gave a talk at Boston.rb about writing distributed realtime computations in Ruby using Storm by Nathan Marz and RedStorm by Colin Surprenant.
There is a video of the talk on the Boston.rb website, and the slides are posted online.
Basically, Storm provides a framework for building streaming/realtime computations (like log analysis, for example) and distributed RPC for running large adhoc computations on a cluster. RedStorm is a JRuby-based adapter for writing these computations and assembling them into topologies (workflows) in Ruby.
Here are the recommended resources from my talk:
Getting started
- storm is the main project.
- RedStorm is the JRuby adapter for Storm.
- storm-starter is a collection of examples in Storm.
- redstorm examples is a similar collection in RedStorm.
Related software tools
- storm-contrib provides integration with many third-party tools like communicating with queues, service buses, and databases.
- storm-deploy “makes it dead-simple to deploy Storm clusters on AWS.”
- storm-mesos provides integration with Apache Mesos for cluster resource management.
Documentation
- Storm wiki has about 40,000 words of excellent documentation.
- The storm-user Google group will cover any questions that the docs don’t.
Talks
Two excellent talks by Storm author Nathan Marz:
At the Philadelphia Emerging Technologies for the Enterprise, “Storm: Distributed and fault-tolerant realtime computation” gives a more extended introduction to Storm. April 2012.
At Strange Loop, Runaway complexity in Big Data discussed “Common sources of complexity in data systems and a design for a fundamentally better data system”. October 2012.
Book
- Big Data is an early access book by Nathan Marz which covers “Principles and best practices of scalable realtime data systems”
Other ESP/CEP resources
Storm lives in a space that’s often referred to as ESP (“Event Stream Processing”) or CEP (“Complex Event Processing”):