Artificial Intelligence Myth Vs Reality: Where Do Healthcare Experts Think We Stand?
The “AI in healthcare: myth versus reality” discussion has been happening for well over a decade. From AI bias and data quality issues to considerable market failures (e.g., the notorious missteps and downfall of IBM’s Watson Health unit), the progress and efficacy of AI in healthcare continues to face extreme scrutiny.
As President of the Mayo Clinic Platform, John Halamka, M.D., M.S., is “not disappointed in the least” about AI’s progress in healthcare. “I think of it as a maturation process,” he said. “You’re asking why your three-year-old isn’t doing calculus. But can your three-year-old add a column of numbers? That’s actually not so bad.”
In an industry as complicated and high-stakes as healthcare, the implementation of artificial intelligence and machine learning comes with challenges that have created a credibility gap. Among the many challenges that Halamka and others acknowledge and are working to address include:
- Data quality, availability, labeling and transparency issues
- Insufficient AI model and algorithm training and AI bias
- The lack of standards, certification processes and general oversight
It’s not all gloom and doom, though, especially when it comes to AI and machine learning for healthcare administration and process efficiency. For example, hospitals and health systems have successfully employed AI to improve physician workflows, optimize revenue cycle and supply chain management strategies, and improve the patient experience.
Iodine Software is one such company that’s making an impact in hospital billing and administration through its AI engine, which is designed to help large health systems capture more mid-cycle revenue through clinical documentation improvement (CDI). The company’s co-founder and CEO, William Chan, agrees that perceived shortcomings of AI are an overgeneralization.
“The impression that AI hasn’t yet been successful is an assumption when you look primarily at the big headline applications of AI over the past 10 years. Big tech has, in many cases, thrown big money at broad and highly publicized efforts, many of which have never met their proclaimed and anticipated results,” said Chan. “There are multiple examples of AI in healthcare that can be deemed successful. However, the definition of success is important, and each use case and AI application will have a different definition of success based on the problem that the ‘AI’ is trying to solve.”
And when it comes to solving problems in clinical care delivery, AI-driven clinical decision support (CDS) solutions are another animal altogether. But for those deep in the field, who have been studying, testing and developing AI and machine learning solutions in healthcare for decades, the increase in real-world evidence (RWE) and heightened focus on responsible AI development are reason enough to be hopeful about its future.
Real World Evidence (RWE) and Clinical Effectiveness: An “Exciting Time” for Healthcare AI
“Personally I think it’s a very exciting time for AI in healthcare,” said Suchi Saria, Ph.D, CEO and CSO at Bayesian Health, an AI-based clinical decision support platform for health systems using electronic health record (EHR) systems. “For those of us in the field, we’ve been seeing steady progress,” including peer-reviewed studies, showing the efficacy of ideas in practice.
This spring, Bayesian Health published findings from a large, five-site study that analyzed the impact of its AI platform’s sepsis model. The two-year study showed that Bayesian’s sepsis module drove faster antibiotic treatment by nearly two hours. Of note, while most CDS tools historically have adoption rates in the low teens, this study, over a wide base of physicians (2000+), showed sustained adoption at 89%. Another separate, single-site study found a 14% reduction in ICU admissions and 12% reduction in ICU length of stay, which translated to a $2.5M annualized benefit for the 250 bed study site hospital.
A 2020 study from scientists at UCSF Radiology and Biomedical Imaging also showed AI’s promise in improving care for those with Glioblastoma, the most common and difficult to treat form of brain cancer. Using an AI-driven “virtual biopsy” approach — beyond the scope of human abilities — UCSF is able to predict the presence of specific genetic alterations in individual patient’s tumors using only an MRI. UCSF found that it was also able to accurately identify several clinically relevant genetic alterations, including potential treatment targets.
Most recently, Johns Hopkins Kimmel Cancer Center researchers found that a novel AI blood testing technology they developed could detect lung cancer in patients. Using the DELFI approach — DNA evaluation of fragments for early interception — on 796 blood samples, researchers found that, when combined with clinical risk factor analysis, a protein biomarker, and computer tomography imaging, the technology accurately detected 94% of patients with cancer across different stages and subtypes.
Abroad, AI is bringing precision care to cardiology with impressive results through HeartFlow’s AI-enabled software platform — a non-invasive option to assist with the diagnosis, management and treatment of patients with heart disease. HeartFlow’s technology has proven to limit redundant non-invasive diagnostic testing, reduce patient time in hospital and face-to-face clinical contact, and streamline hospital visits, while demonstrating higher diagnostic accuracy compared to other noninvasive tests — with an 83% reduction in unnecessary invasive angiograms — and significant reduction in the total cost of care.
Data Quality, Availability, Labeling, and Transparency Challenges
In her dual role as director of machine learning and professor of engineering and public health at Johns Hopkins University, Saria lives and breathes AI research, analysis and development. She also deeply understands the benefits, challenges and possibilities of the marriage between AI and real world datasets, including those in EHRs. Bayesian “makes the EHR proactive, dynamic and predictive,” said Saria, by bringing together data from diverse sources including the EHR to provide a clinical decision support platform that catches life threatening disease complications early, with their sepsis module and results being just one example of a clinical impact area.
However, as anyone working with EHR data can attest to, issues with EHR data quality and usability remain an issue. As Saria notes, “In order to draw safe, reliable inferences, you’re going to need high-quality approaches that correct for the messiness that exists in the data.”
“AI is only as good as the curated training set that is used to develop it,” said Halamka, noting that EHR data is, by its very nature, incomplete and highly-unfit for purpose. “EHR data repositories may only have a small subset of data, for example, or limited API functionally,” and thus might not have the richness to develop a comprehensive algorithm.
At Mayo, there is an AI model for breast cancer prediction that has 84 input variables; the EHR data is only a small portion of that. Additionally, in order to account for social determinants of health (SDoH) — which drive 80% of an individual’s health status — and other information that’s material to the model, Halamka noted that “you’re going to have to go beyond traditional EHR data extraction.”
EHR vendor AI adoption tactics — and results — have also been scrutinized. Algorithms from industry EHR giant Epic were found to be delivering inaccurate or irrelevant information to hospitals about the care of seriously ill patients, a STAT News investigation found. Additionally, STAT found that Epic financially incentivizes hospitals and health systems to use its AI algorithms for sepsis. This is concerning for many reasons, chief among them being false predictions and other concerns voiced by health system leaders who have used the algorithm, as well as adding to AI’s longstanding credibility problem. It also makes clear the industry’s need for broader AI standards and oversight.
Fixing AI’s Credibility Problem: Responsible AI Development
To develop a responsible AI model — and help to fix AI’s credibility problem — Halamka notes that there are a number of data “must-haves”: a longitudinal data record, including structured and unstructured data, telemetry and images, omics, and even digital pathology. Importantly, AI developers also need to continually evaluate the purpose of the data over the course of its lifetime in order to account for and correct dataset shifts.
Left unchecked, a dataset shift can severely impact AI model development. Dataset shifts occur when the data used to train machine learning models differs from the data the model uses to provide diagnostic, prognostic, or treatment advice. Because data and populations can and will shift, AI developers need to continually monitor, detect, and correct for these shifts, which means continuous evaluation. Evaluation “not just of performance and models, but of use,” said Saria, adding that overreliance can lead to overtreatment.
On top of dataset quality and shifts, there are also financial obstacles to getting usable data. “While one of the most exciting domains for AI is in medicine and healthcare, labeled data is an incredibly scarce resource. And it’s incredibly expensive to get it labeled,” said Nishith (Nish) Khandwala, founder of BunkerHill, a startup and consortium connecting health systems to facilitate multi-institutional training, validation and deployment of experimental AI algorithms for medical imaging.
Born out of Stanford University’s Artificial Intelligence in Medicine and Imaging (AIMI) Center, BunkerHill does not develop AI algorithms itself, but instead is building a platform and network of health systems to allow them to test algorithms against different data sets. This kind of validation and health-system partnership is aimed at addressing the legal and the technical roadblocks to collaboration across different health systems, which BunkerHill partner UKHC calls key to successful AI development and application in radiology.
Taking a step back, there are a number of other questions and problems that AI developers must consider when initially creating an algorithm, explained Khandwala. “What does it even mean to make an algorithm for healthcare? What problem or subset of a problem do you start with?” Another challenge is bringing AI to market, which is a moving/non-existent target at the moment.
“For medical devices and novel drug development, there is a clear, established regulatory process: there are documented procedures and institutions to guide the way. That does not exist with AI,” said Khandwala.
And this continues to be an issue for AI development: While there is an established methodical, research-first mindset and regulatory process when it comes to drug discovery, research, development and clinical validation — as you’d expect to see in any other scenario of invention for therapeutic benefit — this is not the case when it comes to AI, where the healthcare industry is still learning how to evaluate these types of solutions.
Standards, Reimbursement and Regulatory Oversight
The industry is also still evaluating how to pay for AI solutions. “Figuring out how a new delivery tool actually gets traction as a commercial product can be very difficult because the healthcare payment system and all the ways we regulate is a fairly unusual marketplace,” said Dale Van Demark, Health Industry Advisory Practice partner at McDermott Will & Emery.
Healthcare also operates under a highly complex and regulated set of payment systems — federal, quasi federal, private and employer plans — with myriad experimentations happening in terms of new care models for better, quality care, said Van Demark. “And within all of that, you have lots of regulatory and program integrity concerns — especially in Medicare, for example.”
And anything having to do with the delivery of care to an individual is ultimately where you get the most regulation. “That’s where the rubber meets the road,” Van Demark says, though he doesn’t see the FDA regulatory process today to be particularly challenging when it comes to getting an AI product to market. “The challenge is in figuring out the business of that technology in the market,” and having a deep understanding of how that market works in the regulatory environment.
Another challenging component? Getting real-world evidence. “For AI to be paid for, you need data that shows your product is making a difference,” says Jiayan Chen, also a partner in the Health Industry Advisory Practice Group of McDermott Will & Emery. “To do that, you need massive quantities of data to develop the tool or algorithm, but you also have to show that it works in a real-world setting.”
Chen also sees issues stemming from the constant blurring of lines in terms of the frequently changing roles of an AI developer. “At what point are you engaging in product development and research, or acting as a service provider? The answer to that will determine the path forward from a regulatory standpoint.“
So what should an AI development process look like, and who should be involved? In terms of developing an AI certification process, similar to the early days of Meaningful Use, EHR software certifications and implementation guides, Halamka notes that there will eventually be certifying entities for AI as well to ensure an algorithm is doing what it’s supposed to do.
AI oversight should not be limited to government bodies. Starting this year, Halamka predicts healthcare will see new public-private collaborations develop to tackle concerns about AI bias, equity and fairness, and wants to see more oversight and higher standards in terms of published studies. “Medical journals shouldn’t publish the results of an algorithm model unless it has a label that says it’s been peer-reviewed and clinically validated.”
At the moment, there’s no governing body explaining the right way to do predictive tool evaluations. But the idea is to ultimately give the FDA better tools for avoiding common pitfalls when evaluating AI and predictive solutions, says Saria; for example, only considering workflow implications instead of looking deeper at the models themselves, or incorrectly measuring impact on health outcomes.
This is also what she is focused on in her role at Bayesian Health: evaluating the underlying technology, making it easy to use and actionable in nature, monitoring and adjusting models in real time, and making sure everything is studied and clinically validated.
“It’s not rocket science; we’re doing things that everyone should be doing.”
Discover Past Posts