Big Data Concerns and Enablers

The concept of “Big Data” surfaced in 2005, when it was first coined by O’Reilly Media, referring to the (then) massive data sets that would be impossible to manage with traditional business intelligence tools.  Of course Big Data in 2005 vs. 2018 is not even close to being the same thing. In 2005 Facebook was barely a year old (as was Web 2.0), IoT was still lurking below the surface, etc. Comparing Big Data then and now is like comparing a firehose to Niagara Falls.

While this sounds like Marketing hyperbole (which it is), the change in Big Data over the past few years can be hard to grasp. Current average mobile daily use for Facebook is 1.15 billion visitors per day, [1] with each of them uploading the minutiae of their lives non-stop. Healthcare informatics generate huge volumes of patient, insurance, and clinical trial data, credit card companies process millions of transactions per second, while billions of RFID sensors and IoT devices stream information steadily (and those are the devices that are “on the surface”, there’s a whole other level of machine to machine messaging systems that also contribute to the data deluge).  These data streams are vast, deep, moving at incredible speed, and continuously changing.

The scary/interesting part is that in spite of the almost incomprehensible volumes of data out there, we have barely shifted out of first gear; the growth of streaming data is not linear, it’s exponential, and we are nowhere near the potential limits of data generation. Forbes (for example), has projected the Big Data/Hadoop market will grow from $24B to almost $100B over the next four years[2], and this number is probably conservative.

How to manage and make use of all this data is going to be one of the biggest challenges facing business for the foreseeable future. How are companies going to make the transition from raw data to useful information to actionable knowledge, particularly as volumes continue to explode and the underlying infrastructure is constantly changing? We need to be looking for solutions that are:

Capable:  Does the enabling technology provide the right capability, now and in the future? Look for open source streaming tools like Kafka, Amazon Kinesis or Azure Event Hub which provide the scale and flexibility needed to handle high volumes of operational data such as log files, transactions, web server data, sensor data, etc. without being overwhelmed. The other big question is how is all this information going to be stored? Given the growth rates and variability of data, open source solutions like Hadoop combined with  noSQL are probably the best choice to deliver the flexibility needed to adapt and grow as data sources continue to evolve and scale.

Actionable: being able to take meaningful action in real-time, based on data coming in at high volume is a huge operational advantage.  In the early days, actionable information was buried in the data; technologies such pattern recognition systems could surface meaning, but it was well after the fact, and therefore not very useful. While data volumes are much higher now, the development of Artificial Intelligence and Machine Learning solutions have moved the analysis of data into real-time, (for example, real-time optimization of advertising and marketing execution), which has provided a critical and sustainable path forward.

Extensible: given the increasingly distributed nature of operational resources such as IoT, more data processing needs to take place at the network’s edge where the events occur (for example, remote sensors tracking pressure flow changes on an oil pipeline). If properly infrastructured, Edge Streaming solutions, particularly when combined with high volume open source enablement tools like Kafka can generate cleaner data coming into an analytic system, which means a more productive use of resources (both human and technology), lower processing requirements, and therefor faster time to value.

Adaptable: data (like it’s sources) comes in a broad range of formats. Is it structured (invoices), semi-structured (IoT data), or unstructured (customer complaints)? Processing and analytic systems need a Discovery capability that  can handle a broad range of variables from all types of data streams, across a variety of formats (text, numeric, video, objects, etc.), draw relevant correlations to legacy data, then parse and normalize the results so visualization systems can deliver real-time information that technical and business analysts can use to make optimal decisions.  That data also needs to be where you need it (cloud, on-premise, or hybrid), not where it’s convenient for the vendor.

Integrated: This is unusually complex, dense technology, and it all needs to work together seamlessly. The highest probability of success will come from a fully integrated solution that covers the entirety of big data streaming, management and analytics. A stand-alone solution may cover one requirement, but Big Data has a more comprehensive scope. There are companies that offer Big Data solutions as part of a broader enterprise set (such as Oracle or SAP), but this needs not only depth and breadth, but most importantly, focus. Look for a company that specializes in fully integrated Big Data solutions that have been properly vetted by big, demanding customers.

The core challenge is how to stay ahead of all this? It’s not just the volume, speed and variety of data, its the complexity of the enabling technologies. How are companies and the people who work there supposed to stay ahead of all these open source solutions, all of which are constantly being upgraded, while data continues to stream in, and (oh yeah) they still have to do their regular job? This is where partnering with the right solutions vendor can make all the difference. Big data offers companies an unprecedented ability to drive their business to a whole new level, and working with the right technology partner is the key, critical decision.

[1] https://zephoria.com/top-15-valuable-facebook-statistics/

[2] https://www.forbes.com/sites/louiscolumbus/2018/05/23/10-charts-that-will-change-your-perspective-of-big-datas-growth/#528062c42926