What Is Stream Processing and Why Is It Important to Your Business?

Reposted from Hazelcast. Stream processing is a term that is becoming increasingly relevant to the technical side of any company, but is still relatively unknown on the business side. The term “Stream” refers to data streams, that is, data that is continuously entering a business. Data has been entering computer systems for decades, the difference now is that the volume and speed is at a level unimaginable just a few years ago. Stream processing refers to onboarding, analyzing and integrating the data in a way that provides insight to the users of the technology, preferably as close to real-time as possible.

What is the business context for this? There are billions of mobile devices, potentially trillions of IoT devices, all of them streaming data into systems that are racing to keep up with volumes that are increasing exponentially. The companies that you engage with everyday (in fact, many times per day) are all dealing with this, and you are part of that stream. Every time you upload to Facebook, every time you tweet, every time you complete an online transaction, there’s a billion people just like you doing the same thing at the same time. We are all part of the streaming ecosystem, whether we realize it or not.

The true value add with streaming technologies is in being able to sort through the noise to find a useful signal, that is, how do you gain insight from such a massive and varied stream of data? This is where stream processing add genuine value; being able to query a non-stop data stream and detect anomalies in the data quickly enough to do something about it is a huge competitive advantage. Example? Bioinformatics sensors that can report real-time changes in a patient’s condition remotely, enabling medical personnel to respond quickly and accurately. Think about the amount of information flying around a big hospital at any given moment, streaming can pick out the one critical data point that can make a genuine difference to the patient.

Data coming in at high speed has been part of the background noise for decades, and has been referenced using different descriptors such as real-time analytics, complex event processing, and so on. At the moment the term everyone seems to be pivoting around is stream processing, but if history is any indication, this will probably morph soon enough. The need to analyze and understand data, however, will always be there, and it’s safe to assume the volumes are going to keep heading up – in 10 years the descriptor “trillions” is going to seem quaint.

So how important is stream processing? The better question is how important and/or useful is it that you have immediate insight into how your business is operating? Look at real-time trading in commodities; a fraction of a second advantage can literally translate into millions in profit or loss. Think of major consumer product companies doing global launches of products where millions of people log in at the same time to purchase (e.g. Black Friday, Cyber Monday, Singles Day), and expect an instant response. Not every transaction requires an immediate response, but there are many (e-commerce, financial, healthcare, security) that do, and that is the bullseye for stream processing. The problem is that companies need the ability to 1) recognize that something has happened (that is, pick out the relevant snowflake in a blizzard) and 2) be able to act on it in a meaningful and immediate way. Immediacy matters because most data is highly perishable and its shelf life can be measured in microseconds.

The implication here is that streaming technology works best when dealing with (you guessed it), streams. Not all data comes in at high volume and not all data requires immediate analysis. But for data that does fit that profile (e.g. fraud detection for credit card transaction approval), it is incredibly valuable. Stream processing lets you detect anomalies that would have been lost in the noise a few years back, and (even cooler) lets you analyze multiple streams at once. This framework is becoming particularly prevalent with the rise in IoT devices, which pretty much exist solely to generate big data on a continuous basis (e.g. utility telemetry systems, traffic sensors, ICU informatics, etc…).

The companies that have figured this out and applied it to their data are the ones that are leading their industries. Going back to an earlier example; look at Facebook. Billions of people telling vendors in the ecosystem literally everything about themselves. How, as a vendor, do you catch a prospect at the perfect moment, when only a slight push will move them in your direction? When major electronic device manufacturers release the next must-have shiny object, millions of people will order at the same time, and everyone will get a confirmation in seconds, where the manufacturer has been able to check inventory, location, shipping alternatives, taxes, previous purchase history, service contracts, trade-in value, etc., and all in literally the blink of an eye.

The bottom line is that companies that know how to leverage event streaming data make smarter and faster business decisions.