A Google-developed AI that was capable of identifying cases of diabetic retinopathy (DR) with 90 percent accuracy in the testing laboratory has turned out to be much less useful in clinics and hospitals.

In laboratory settings, the AI designed by Google Health performed at the equivalent level of a medical ‘specialist,’ but in testing at 11 clinics in Thailand between November 2018 and August 2019, it was substantially less effective.

The main challenge for researchers was the quality of images being fed to Google’s AI, with 21 percent of the 1,838 photographs taken of patients graded as too low in quality to be processed because of inadequate lighting or unreliable photographic ability of the local clinic workers.

Google Health tested an AI designed to identify diabetic retinopathy at 11 clinics in Thailand, where it performed well below the 90 percent accuracy rating it had in the lab

Another challenge was slow internet speeds, which made the process of uploading and processing images time consuming.

One clinic worker estimating they could only screen around 10 patients in a two hour window, according to a report in Newsweek.

‘Poor lighting conditions had always been a factor for nurses taking photos, but only through using the deep learning system did it present a real problem, leading to ungradable images and user frustration,’ the team wrote in a summary of their testing.

‘Despite being designed to reduce the time needed for patients to receive care, the deployment of the system occasionally caused unnecessary delays for patients.’ 

In addition to the technical challenges, the researchers discovered a number of cultural hurdles, including the fact that patients who opted into the study and would potentially be diagnosed would have to travel to far away specialists at significant cost to the patient’s family.

Diabetic retinopathy is a condition caused by bleeding or swelling in the eye caused by high blood pressure. It’s usually diagnosed by having a clinic worker take a photograph, then sending it to a specialist, a process that can take up to 10 weeks

Because of the unreliability of the tech and the potential burden of traveling for further follow-up care, the team found that many nurses had begin recommending patients not use the AI method at all.

‘Patients had to consider their means and desire to be potentially referred to a far-away hospital,’ the team wrote.

‘Nurses had to consider their willingness to follow the study protocol, their trust in the deep learning system’s results, and whether or not they felt the system’s referral recommendations would unnecessarily burden the patient.’ 

21 percent of the 1,838 photographs uploaded to Google’s AI were unable to be processed because of poor lighting, inadequate image quality, or slow internet speeds

Diabetic retinopathy is a condition marked by swelling or leaks in blood vessels in the eyes, typically caused by high blood pressure.

The conventional method of testing for DR involves a nurse taking a photograph of the retina and sending it to a specialist for further analysis, which can take as long as 10 weeks in Thailand.

Google hoped to shorten the window between a person becoming symptomatic and receiving a formal diagnosis so they could begin receiving treatment sooner.

Despite suboptimal performance of the AI, Google Health consider the study a success . ’A properly conducted study is designed to reveal impacts, both positive and negative, if we hadn’t observed challenges, that would be the failure,’ Google Health’s Emma Beede says

Despite the seemingly discouraging findings, Google Health considers the test a success. 

‘These studies were successful in their intended purpose: to uncover the factors that can affect AI performance in real world environments and learn how people benefit from the tech, and refine the tech accordingly,” Google Health researcher Emma Beede told Newsweek.

‘A failure would have been to fully deploy technology without studying how people would actually use and become affected by it.’

‘A properly conducted study is designed to reveal impacts, both positive and negative, if we hadn’t observed challenges, that would be the failure.’