Google’s AI Health Screening Tool Claimed 90 Percent Accuracy, but Failed to Deliver in Real World Tests
A team of Google researchers is going back to the drawing board after an artificial intelligence-based health care project fell short of expectations.
In an academic paper published this week, the team detailed how a deep learning tool that showed great promise under lab conditions had at times sparked frustration and unnecessary delays when rolled out into real-world clinical conditions.
The project took place between November 2018 and August 2019, with fieldwork being conducted at 11 clinics in the provinces of Pathum Thani and Chiang Mai, Thailand. Its aim was to use the technology to detect diabetic retinopathy (DR), a condition that can lead to vision distortion or loss, while helping the workflow of nursing staff.
Google said its AI has a “specialist-level accuracy” of over 90 percent for the detection of referable cases of DR. It quickly faced a series of unforeseen challenges.
Researchers found the tool needed high-quality images to work, which staff could not always provide. They found eye-screening processes varied significantly between the clinics, and not all locations had high-quality internet connections. In some cases, the system actually appeared to slow down already lagging systems in place.
The Google researchers wrote in the final paper: “We discovered several factors that influenced model use and performance. Poor lighting conditions had always been a factor for nurses taking photos, but only through using the deep learning system did it present a real problem, leading to ungradable images and user frustration.
“Despite being designed to reduce the time needed for patients to receive care, the deployment of the system occasionally caused unnecessary delays for patients.
The analysis added: “Finally, concerns for potential patient hardship (time, cost, and travel) as a result of on-the-spot referral recommendations from the system, led some nurses to discourage patient participation in the prospective study altogether.”
The study was still worthwhile, the team said. It was the first to analyze how nurses can use AI to screen patients for diabetic retinopathy (DR), and the findings will be used to improve the systems for the future, Google suggested in a blog post.
Without the AI, nurses take a photo of a patient’s retina before sending the image to an ophthalmologist for review. The process can take up to 10 weeks. Google set out to test if using the algorithm could speed things up and provide instantaneous results.
But that did not prove to be easy, the researchers said.
Researchers soon learned some nurses were dissuading patients from participating in the prospective study over fears it would cause them “unnecessary hardship” as they would potentially have to travel to another hospital should they be referred.
“Through observation and interviews, we found a tension between the ability to know the results immediately and risk the need to travel, versus receiving a delayed referral notification and risk not receiving prompt treatment,” the paper said.
It added: “Patients had to consider their means and desire to be potentially referred to a far-away hospital. Nurses had to consider their willingness to follow the study protocol, their trust in the deep learning system’s results, and whether or not they felt the system’s referral recommendations would unnecessarily burden the patient.”
On top of that, researchers soon realized the deep learning system was not designed to work with low quality, dark or blurry images. This is to help decrease the chance that the tool would make an incorrect assessment, but it caused issues, Google said.
“Out of 1838 images that were put through the system in the first six months of usage, 393 (21%) didn’t meet the system’s high standards for grading,” the team said.
The paper added: “The system’s high standards for image quality is at odds with the consistency and quality of images that the nurses were routinely capturing under the constraints of the clinic, and this mismatch caused frustration and added work.”
Nursing staff voiced similar complaints. One staff member told the team: “Patients like the instant results but the internet is slow and patients complain. They’ve been waiting here since 6 a.m. and for the first two hours we could only screen 10 patients.”
Another nurse, noting the problems caused by slow internet speeds, said: “Patients like the instant results but the internet is slow and patients complain. They’ve been waiting here since 6 a.m. and for the first two hours we could only screen 10 patients.”
Google said its work is not done. It has started to hold design workshops with nurses, potential camera operators and retinal specialists at future deployment sites.
“It’s important to study and incorporate real-life evaluations in the clinic, and engage meaningfully with clinicians and patients, before the technology is widely deployed,” research member Emma Beede wrote in a company blog post this week.
She continued: “That’s how we can best inform improvements to the technology, and how it is integrated into care, to meet the needs of clinicians and patients.”
Discover Past Posts