Stream processing has emerged in recent years as a very fast-growing paradigm in data science infrastructure. This rise can be partly attributed to some factors external to system design, such as business demands for near-realtime data or inability of hardware to manage an ever-growing data set. However, this paradigm also possesses many inherent strengths, and there is good reason for it to be embraced, not simply tolerated. In this talk I’ll discuss some high level advantages of processing data in streams, such as fault tolerance, horizontal scalability, and composability. I’ll then introduce NSQ, Bitly’s open source queueing system, and discuss how it provides us with these advantages and how it approaches the tradeoffs inherent in designing distributed systems. I’ll also discuss some of the burdens that NSQ places on developers, such as idempotent operations, and why they are necessary. Finally, I’ll discuss some new technologies that aim to abstract away the mechanism of communcation between streaming programs, and talk about the powerful opportunities and risks that they offer.
presentation video