Data is the new oil, some say, forming a coveted resource that powers enterprise decision-making. Although, data in its raw form isn’t good for much. It needs to be extracted, refined, and processed—its constituents funneled into various byproducts through pipelines that range from source to refinery to end consumer.
Every bottleneck in that system has an affixed dollar cost. Data that is improperly analyzed for use results in essentially a waste product, and as datasets grow, it has become a more burdensome task to extract the appropriate, most valuable information to funnel downstream.
In recognition of this challenge, a handful of companies have sought to automate stopping points along the data pipeline, a process called Robotic Data Automation, or RDA.
Enterprise datasets aren’t just growing, in many cases they’re also becoming real-time. These sets are embodied in a variety of formats and spread across a company’s sprawling IT infrastructure—including on-premises servers, off-premises clouds, and along the edge.
They require collection, cleanup, validation, extraction, metadata enrichment—an extensive series of steps just to get the data prepped for its intended use. Every step can be time-intensive, and failure at any step can result in invalid outputs.
RDA aims to automate many of these processes using low-code bots that perform simple, repetitive tasks, with linkages to more complex artificial intelligence (AI) tools, such as IBM Watson, OpenAI, GPT-3, or hundreds of other bots, to execute natural-language processing (NLP) tasks when necessary.
Effectively, a simple machine is designed to cobble together disparate elements, calling on more sophisticated machines when they’re needed, in order to compile raw data into something usable. If executed correctly, automation can help enterprises realize the value of information far more quickly.
RDA tools can also help break up the existing paradigm of data handling, whereby AIOps vendors offer limited, pre-defined sets of tools for customers to interact with their data. These tool sets have limited linkages with other tools, narrower scopes of use cases, and more restrictive data formatting outputs.
Companies like CloudFabrix, Snowflake, and Dremio claim their RDA tools liberate customers from these constraints and include other benefits, such as synthetic data generation; on-the-fly data integrity checks; native AI and machine learning (ML) bots; inline data mapping; and data masking, redaction, and encryption.
Other use cases for RDA tools include:
- Anomaly Detection: Pulling data from a monitoring tool, comparing historical CPU usage data for a node, then using regression to construct a model that can be sent as an attachment
- Ticket Clustering: Compiling tickets from a company’s ticket management software, clustering them together, and then pushing the output into a new dataset for visualization on a dashboard of choice
- Change Detection: Examine virtual machines (VMs) and make comparisons against current states to detect unplanned changes
RDA vs. RPA
Many will be familiar with robotic process automation, or RPA. The older concept carries similarities with RDA in that both aim to simplify common tasks through the use of low-code bots. Where they diverge is that RPA is intended for simplifying common user tasks and workflows, whereas RDA is aimed squarely at the data pipeline.
Although, both RDA and RPA simply mean using simple bots to save time on time-consuming, menial tasks, though with different contexts.
A common example of RPA is a bot empowered with ML capabilities for form completion. The bot monitors how a human repeatedly fills a form until the RPA is trained on the appropriate manner in which the form is to be completed. This type of machine learning is similar to how cellphones can generate predictive text suggestions based on their users’ conversational habits and vocabulary.
Once trained, the bot can take command of form completion, along with other aspects such as submitting the form to its expected targets. While this can expedite the process in the long run, RPA systems can take months to train before their advantages come to fruition.
RDA’s Long Term Value
There’s always going to be value in automating time-intensive tasks and freeing up human labor for jobs that are more cognitively demanding. As one bottleneck is opened, another will come to take its place. However, the success of these systems like RDA or RPA hinges on their implementations.
Naturally, the tools need to be designed properly to interact with their intended datasets, but enterprises also have a responsibility to properly integrate new tools with their existing data pipelines. AI-driven tools and automation softwares are still in their infancy, still finding new niches to serve, and still being refined in terms of how they deliver service. How RDA shakes up data pipelines is a story yet to be told.