How Robotic Data Automation Could Automate Data Pipelines
Global head of AI and strategic alliances at Weka.IO, driving AI strategy and business growth.
AI has certainly become the hallmark of digital transformation strategy. According to IDC, global AI spending is forecasted to reach $500 billion in 2024 with a CAGR of 17.5%. Likewise, Gartner predicts low-code application platforms (LCAP), robotic process automation (RPA) and AI are fueling the growth for hyperautomation, and the market will reach $596 billion in 2022, up nearly 24%.
Hyperautomation has become paramount, as businesses “will require more IT and business process automation as they are forced to accelerate digital transformation plans in a post-Covid-19, digital-first world,” according to Gartner VP Fabrizio Biscotti.
In spite of this growth, up to 73% of company data is unused for analytics and insights, according to Forrester. Businesses have also faced challenges because most predictive models only use historical data and not streaming (i.e., real-time) data.
Enterprises have struggled to collaborate well around their data, which inhibits their ability to adopt transformative applications like AI. A recent KPMG survey also reports, for example, that 78% of CEOs in the U.S. did not use data-driven insights because the insights were siloed and could not translate to the entire organization.
A 2019 Gartner survey found that the top four challenges companies face were security or privacy concerns (30%), complexity of AI integration with existing infrastructure (30%), data volume or complexity (22%) and potential risks or liabilities (22%). They also found that it could take eight months or longer to integrate an ML model into enterprise applications.
The ability to analyze data from disparate sources and make holistic decisions using BI, AI and cloud-native applications is paramount for the “augmented consumer,” a business user persona who is looking to be empowered to run on-demand, customized, conversational analytics dashboards across disparate business applications.
Robotic Data Automation
In order to solve these challenges, it is important to work with an ecosystem that can automate data integration and data preparation activities. Companies like Snowflake, CloudFabrix and Dremio have developed a new strategy, Robotic Data Automation (RDA) — that is similar to what Gartner refers to as XOps — to automate data pipelines across disparate data sources in a manner similar to how RPA has transformed business processes.
RDA leverages both historical and real-time datasets using low-code and no-code data bots for on-the-fly data integration, data cleansing, data transformation and data contextualization. It complements ecosystems that use ETL and ELT — such as data warehouses, data lakehouses and data platforms — to allow ingesting, easy access and sharing across distributed data environments.
To orchestrate the entire composable data-centric AI pipeline, RDA syncs integrations to data sources and visual dashboards by using libraries of pre-built data bots or leveraging external models — such as IBM Watson or OpenAI — for natural-language processing (NLP)/natural-language understanding (NLU) purposes or conversational queries. RDA solutions typically provide an Interactive Development Environment (IDE) that uses natural language texts, such as configurational semantics, in order to empower the “augmented consumer.”
Operationalizing AI/ML Data Pipelines With RDA
RDA could be the missing link for implementing composable data and analytics by integrating DataOps with the ModelOps, MLOps and PlatformOps frameworks. It could enable composable data pipelines to derive holistic and complete situational awareness for a 360-degree view to support better decision-making for the augmented consumer.
Some of the use cases for RDA are situations where composable data pipelines and DataOps are essential for holistic decision-making. This could be for use with data platforms, where ITOM, ITSM or ITIM users want to leverage AIOps tools as well as with ELT-based enterprise data warehouses.
To get started with RDA, organizations need to effectively identify their business goals, KPIs and how data should be used to reach those goals. Some goals to consider are: improving productivity by enhancing time to market and time to insights, reducing risk, improving the security of SLAs or deepening customer insights.
For use with data platforms to work with RDA, in particular, the platform must be able to cater to disparate data sources from BI, AI and cloud-native applications. Platforms should make data access transparent — i.e., whether the data is coming from edge, core or multi-cloud. Finally, data platforms should also facilitate deriving actionable insights, using multi-protocol access methods with performance and capacity tiers for big, small and wide data, as small and wide data enables businesses to make holistic decisions across the organization.
That said, in order to effectively use RDA, ensure data platforms comply with the security aspects mandated by GDPR and CCPA regulations and that they protect and mask personally identifiable information (PII). Likewise, data platforms should be able to support “Explainable AI” to build trust as well as reproduce, reuse and retrain composable data pipelines.
Discover Past Posts