Disclaimer - The views, thoughts, and opinions expressed in the text belong solely to the author, and should not in any way be attributed to the author’s employer, or to the author as a representative, officer or employee of any organisation.
This article is an excerpt from my book “Practical Data Analysis: Using Open Source Tools & Techniques” (available on Amazon worldwide, iBook Store, and Barnes & Noble).
In today's highly competitive and heavily regulated business environment, no one can really afford to be reactive when it comes to managing one’s day-to-day business risks. Instead, it is now all about rapidly responding to risk events through ongoing, real-time monitoring, with advanced analytics to spot the anomalies and potential threats before they can cause any serious harm to the business.
An organisation’s ability to measure, monitor and optimise business processes also has a direct impact on revenue and customer satisfaction. Claims processing in healthcare, trade confirmations and settlements in financial institutions, and new service activation in telecommunications companies are all examples of complex business processes. Being able to correlate data across multiple systems and geographies to gain a real-time, end-to-end view of an entire business process, and to pinpoint and resolve problems quickly is therefore vital to remain competitive and profitable as an enterprise.
But it is easier said than done. In most medium to large enterprises, business processes flow across multiple diverse IT systems and manual touch-points, giving rise to a situation where data is scattered everywhere in large quantities - in client facing systems, operations and finance applications, obscure legacy systems, in databases and network folders, on message queues, and system logs. Data is also locked down in unstructured documents, like Microsoft Excel, e-mail, chat and voice messages. To make matters even worse, quite often than not, the content, quality, structure, and definitions of the data also vary from one system to another. Without a single view of relevant data, and a consistent understanding of its meaning enterprise-wide, how do you build an effective continuous monitoring and real-time analytics capability?
A framework for continuous monitoring and real-time analytics
Creating a real-time data architecture and using it to run streaming data analytics applications is a complicated undertaking. For starters, such systems don't come in a box, and setting them up is a complex process that requires piecing together various data processing technologies and analytical tools to meet the particular needs of the end users. Compounding the problem is the availability of a whole range of proprietary and open source big data technologies, each vying for a piece of the real-time and predictive analytics market. Quite often, we also get caught in the technical details of these competing offerings and the marketing buzzwords, and lose sight of the big picture - i.e. what are we trying to achieve?
With these challenges in mind, I have been on the lookout for quite some time for a well-supported, open source, and one-stop shop type solution that can meet the continuous monitoring and real-time analytics needs of both large and small enterprises. In my opinion, the ideal solution would be the one that meets the following requirements –
a) It must be able to automatically collect data in real-time, from any type of source and in any format, and transform the data (e.g. filter, enrich and standardise) on the fly.
b) It must provide a built-in mechanism for the storage and fast retrieval of lots and lots of structured and unstructured data (i.e. a schema-less NoSQL type database rather than a traditional relationship database).
c) It must provide a mechanism to interrogate the data in real-time using built-in and configurable analytical models. Additionally, we should also be able to interrogate the data using external tools, such as custom-built machine learning algorithms.
d) It must have configurable data visualisation and dashboard functionalities.
e) Finally, we should be able to put it all together with minimal development and maintenance costs.
This might sound like I am asking for a lot. But the good news is that I did find two open source platforms (although certain advanced features require commercial license) that pretty much meet these requirements - Elastic Stack (link) and TICK Stack (link).
Elastic Stack is well suited for collating and analysing in real-time both structured and unstructured data, such as event logs (system logs, transaction records, trade booking etc.), free form texts (emails, web pages, twitter messages etc.) and documents (Word, PDF, Excel files etc.).
TICK Stack, on the other hand, has been designed from ground up to facilitate the collection, storage and analysis of time series (share price, sensor data etc.) and metrics (sales figures, financials etc.) data at scale.
In chapter nine of my book (Practical Data Analysis), I have summarised some of the key features that make Elastic Stack and Tick Stack the perfect tools for continuous monitoring and real-time analytics. Time permitting, I plan to write a few tutorials on how to use these tools in some real-world continuous monitoring use cases (keep an eye on my blog if you are interested). Nevertheless, the best way to learn about the amazing things you can do with these two open source tools would be to install them on your laptop or desktop (Windows, Linux or Mac OS), load a sample of freely available datasets, and play around with their interactive data analytics and visualisation capabilities, and plug-in your own machine learning or statistical models. In addition, there is also a huge collection of YouTube videos covering these two platforms that you can watch and learn from.
Dhiraj Bhuyan, 02 August 2018



Nice blog..! I really loved reading through this article.
ReplyDeletevision inspection system manufacturers
Great insights! Continuous monitoring is a crucial aspect of maintaining data integrity and security, and this framework provides a structured approach to achieving it. The integration of data analytics in monitoring processes enhances efficiency and proactive decision-making. Appreciate the well-researched content—looking forward to more such valuable insights!
ReplyDeleteThis article provides a thoughtful and practical framework for continuous monitoring that’s valuable for anyone working with real‑time systems. It highlights how data analytices service help turn operational data into meaningful insights. With effective Data Analytics Solutions and robust Data Science Services, teams can improve responsiveness and decision‑making. Partnering with a trusted Data Science Consulting service makes implementing these strategies even more effective — thanks for sharing!
ReplyDelete