“Real business is done on paper, OK? Write that down.” Michael Scott, the hapless regional manager in the American version of the TV series, The Office, offered this bit of wisdom to a lecture hall full of business students. The punch line: they were all taking notes on laptops. The humorous contrast highlights a key challenge companies face today. Despite operating in an increasingly digital world, many businesses still use paper in their processes and need to extract information and insights from these documents. Companies that find a better way to do so can improve performance and capture value over both the near and long term.

Today, advances in AI—especially machine learning (ML) applications such as optical-character recognition (OCR)—enable more efficient data processing, including faster and more accurate information retrieval from paper documents. The combined application of classification, extraction, and other algorithms—known collectively as “intelligent document processing” (IDP)—is erasing the boundary between the analog and digital worlds. The important distinction IDP offers is that the technology not only digitizes analog documents (by scanning them into digital format) but also allows computers to understand the data in documents.

However, like most automation solutions, IDP technology cannot yet handle these tasks by itself. Companies deploying IDP will still need a significant human workforce to configure the platform, train the algorithms, monitor outputs and performance to ensure accuracy, and handle exceptions. Thus, even as the technology becomes more accurate over the long term, humans must remain in the loop to ensure that scaled-up deployments continue to perform well (Exhibit 1).

Exhibit 1

We strive to provide individuals with disabilities equal access to our website. If you would like information about this content we will be happy to work with you. Please email us at:

[email protected]

Analog data aren’t going away anytime soon

The industrial revolution in services encompasses irreversible shifts, accelerated by the COVID-19 pandemic, that create imperatives for organizations to reconfigure operations (Exhibit 2). Rapid and extreme digitization is one of these changes—organizations are moving from technology enablement of legacy operations to full digitization. The results of a 2020 global survey of executives highlight the pandemic’s transformative impact on operations. Among survey respondents, 85 percent said their businesses have somewhat or greatly accelerated the implementation of technologies that digitally enable employee interaction and collaboration, such as videoconferencing and filesharing. Roughly half of those surveyed reported increasing digitization of customer channels, for example, via ecommerce, mobile apps, or chatbots. Some 35 percent have further digitized their supply chains, for example, by connecting their suppliers with digital platforms in supply chain management.

Exhibit 2

We strive to provide individuals with disabilities equal access to our website. If you would like information about this content we will be happy to work with you. Please email us at:

[email protected]

Despite the overwhelming progress toward digitization, paper and electronic documents remain relevant and valuable. Even at COVID-19’s peak in 2020—the most recent year of available data—an estimated 2.8 trillion pages were printed in 2020. According to 2021 data from the US Bureau of Labor Statistics, US companies collectively spend $5.3 billion annually on wages for data-entry keyers. And that figure does not include many other expenses, such as those relating to systems, facilities, and offshore operations, or to pay people for whom data entry is a large part (but not all) of their job, such as accounts-payable clerks.

Analog data is, and will inevitably continue to be, produced, and will likely remain a core part of the operating model of virtually all organizations. As an example of a similar disruption, consider the call-center industry. Even with the acceleration and adoption of self-service chatbots and other natural-language processing tools, our research indicates that in most cases, calls requiring a human touch still flow into centers, including those that are too complex or emotionally intense for bots to handle well. This reality is not simply resistance to change but stems from several factors:

  • Experience. For some people, the benefits of the analog experience, such as ease of use, are more important than the efficiency of digital.
  • Environment. In some circumstances, such as industrial settings or face-to-face meetings, jotting down information on paper is preferable to using digital interfaces.
  • Enterprise architecture. Organizations have reached different stages of maturity for their system and data architectures. In some cases, the current architecture limits a company’s short-term ability to digitize processes. The more mature digital adopters may find it easier to implement new solutions to streamline cycle time, whereas others may need a multimillion-dollar revamp.

Analog and digital worlds are hard to integrate

The persistence of analog means that nearly every organization faces the common challenge of accessing data trapped in analog and unstructured formats. To extract the data, many organizations process documents and images manually, or separate data entry and processing from core operations. Think of a typical auto insurer. If one team extracts data from a customer’s supporting documents while a separate team processes the claim, the company loses the opportunity to validate inputs against known values. Sorting through documents looking for precious data prolongs cycle times and increases processing costs—an unsustainable waste of resources in today’s business environment.

Too often, to extract a relevant data sample through a manual process, an analyst would have to open millions of documents spanning many years, seeking out the relevant data fields and inputting the figures into a structured format. Because the required tasks are repetitive and monotonous and add little value, manual-processing roles experience low job satisfaction and high turnover. Manual processing is also highly error prone, creating further operational inefficiencies and confusion among teams that tend to be siloed from one another.

Organizations that have been recoding data in analog formats for decades also generate pockets of “trapped” data that are difficult to find but essential to access for models built on historical comparisons. A mortgage underwriting model, for example, cannot rely only on mortgages originated during the previous two years—market cyclicality makes it essential to incorporate data from documents going back decades.

Recognizing that there is simply too much data for humans to manually collect and digitize, organizations have turned to automation for help. According to an upcoming McKinsey Global Executives Survey, 70 percent of respondents say their organizations are at least piloting the automation of business processes in one or more business units or functions. Based on our research, intelligent document management and processing tools (including OCR) are the most frequently deployed automation solutions beyond the pilot case.

Moreover, managing the technical aspects of deploying automation on top of current IT infrastructure is cited as the most significant automation-related challenge organizations are confronting. One underlying reason is that organizations have struggled to seamlessly integrate the digital and analog worlds. In many cases, current extraction tools are cumbersome to set up and do not offer configuration and capability-building support. Because these tools often fail to yield sufficient accuracy and automation levels over time, companies must make large investments in exception handling. Keeping humans in the loop appears to be critical for success. The 2021 global survey found that executives from companies that have seen success with automation are twice as likely as others to report using human-in-the-loop designs.

Shortcomings in quality and completeness mean that companies’ digital and advanced analytics cannot take full advantage of all the available analog, unstructured data. Ultimately, poor and/or underutilized data sources and a backlog of analog documents create a bottleneck in data supply, limiting companies’ use of digitization levers to make smart decisions and capture value.

The case for IDP

IDP offers a solution to the challenges of accessing analog data to create digital, machine-readable data for downstream consumption. More than simply data entry, it includes classification, extraction, and validation.

  • Classify. The IDP system automatically classifies the submitted documents using machine learning.
  • Extract. The system uses computer vision to extract relevant fields, whether the images are handwritten, machine printed, or poorly scanned.
  • Validate. The system flags output for human review whenever it is unsure of its performance.

To support and enhance IDP, humans stay in the loop. Employees oversee IDP performance, review and process flagged exceptions, and help the system to improve over time (Exhibit 3).

Exhibit 3

We strive to provide individuals with disabilities equal access to our website. If you would like information about this content we will be happy to work with you. Please email us at:

[email protected]

Significant short- and long-term productivity gains are possible

In the short term, companies can reduce the number of manual, repetitive tasks required to read documents and thereby substantially eliminate human errors. Companies can also standardize outputs and reduce time for training people to handle manual tasks. The cost improvements translate into higher margins. The speed of margin improvements depends on how many process journeys receive analog data and how many channels are utilized.

Over the long term, the accuracy of IDP outputs and reduction in manual intervention can be significantly enhanced through intelligent self-learning. Gaining new insights from previously unused data enables faster decision making and promotes competitive advantage. IDP is also highly scalable, allowing companies to extend its application into multiple areas of the organization or expand its use as the business grows. As capability builds, scaling across the organization will become easier, while centralization and creating shared utilities for the enterprise can bring further opportunities. The organization can build a strong strategy encouraging human and machine collaboration for optimized productivity.

Gaining new insights from previously unused data enables faster decision making and promotes competitive advantage.

A leading North American financial institution that had been struggling to automate its operations provides a useful example: its initial deployments built technical capabilities but failed to deliver measurable bottom-line impact. A major breakthrough came with the implementation of IDP to create digital structured data for downstream processing (such as review of securities trades). To maximize the automation potential, the company deployed IDP capabilities alongside other process and technology levers, such as lean and robotic process automation (RPA). The automation program saved more than 20,000 employee hours in one year.

A global investment management firm had only one year to comply with know-your-customer regulations that required it to extract and reconcile co-applicant account information across more than one million pages. The documents were a mix of handwritten text, faxes, and low-resolution images that legacy data extraction technology (such as OCR) was unable to read and process. By implementing IDP, the firm was able to double its processing speed with half as much effort for a fourfold increase in document throughput within four weeks.

Beyond productivity gains, companies can greatly enhance the effectiveness of operations and decision making. IDP is the top of the funnel for taking advantage of emerging automation technologies that rely on digitized, structured data to function—such as RPA. When enterprise data is ingested through an IDP solution, it becomes useful for many applications—machine learning modules, predictive analytics and data clustering, and AI-enabled cognitive agents—that can yield substantial competitive advantages in an organization’s tech-enabled operating model.

To promote these benefits, companies need to fully integrate IDP into each stage of journeys and processes. For example:

  • Beginning. An insurance carrier used IDP at the beginning of the endorsement process to improve its intent identification engine, which had classified endorsement requests into more than 200 potential types.
  • Middle. A bank applied IDP in the middle of the mortgage origination process to extract information from supporting documents, such as tax returns and brokerage statements. The additional information allowed the bank to automatically validate customer responses in the mortgage application.
  • End. A healthcare company applied IDP at the end of a talent acquisition and onboarding process to extract data that an algorithm would use to identify drivers of retention—a critical outcome in a field with high turnover rates.

Meeting the challenges of implementation

Achieving step-change improvements and sustained impact from IDP is not easy, however, as deploying technology is only one step in the process and will not by itself drive value and savings.

To begin, it is critical to identify which of the many processes needing digitization of analog data are the right ones for conducting initial pilots. Quick wins with the highest automation potential help to free up capital and encourage faster scaling across the organization. Leading organizations apply a top-down opportunity estimate and bottom-up identification of processes to determine where technology can promote the greatest impact. Processes for intaking structured documents, such as tax forms or personal IDs, are often the best places to start.

In some cases, a process may have multiple entry points (Exhibit 4). For example, purchase orders (POs) may enter a process through different channels (such as phone, email body or attachment, fax, or digital submission) and in different formats (paper forms, PDFs, letters, digital portal entry, electronic data interchange, or API). A company can address each of these channels and formats individually through differentiated technology enablement and communications. Creating a phased digitization approach that addresses each entry point by volume (either in total number of POs or total PO amount) is often a good starting point. The implementation can then be spread out over time and mapped to specific success criteria.

Exhibit 4

We strive to provide individuals with disabilities equal access to our website. If you would like information about this content we will be happy to work with you. Please email us at:

[email protected]

Questions to assess the opportunity

Before designing an IDP implementation, companies should consider a set of questions to understand their baseline and potential opportunities:

  • Where do we have large volumes of analog data that are not being fully leveraged by our digital operations?
  • How much time are people spending today keying data from documents?
  • Where are we struggling with adoption of digital channels and/or encountering resistance to giving up paper alternatives?
  • Which document-heavy journeys are we redesigning, and have we fully incorporated the latest IDP capabilities in those teams?
  • How well do our current IDP solutions work, and have we fully explored the latest capabilities in the marketplace?

Whether structured, semistructured, or unstructured, the analog information trapped in manual documents is too useful to go to waste. When implemented thoughtfully, IDP can solve many of the challenges of digitizing data and capturing its value. Companies can use their digitized data to increase the throughput and accuracy of their processes in the near term, while developing insights and unlocking capacity to promote enterprise efficiency and effectiveness over the long term.