Three years ago, artificial intelligence pioneer Geoffrey Hinton said, “We should stop training radiologists now. It’s just completely obvious that within five years, deep learning is going to do better than radiologists.”

Today, hundreds of startup companies around the world are trying to apply deep learning to radiology. Yet the number of radiologists who have been replaced by AI is approximately zero. (In fact, there is a worldwide shortage of them.)

At least for the short term, that number is likely to remain unchanged. Radiology has proven harder to automate than Hinton — and many others — imagined. For medicine in general, this is no less true. There are many proofs of concept, such as automated diagnosis of pneumonia from chest X-rays, but surprisingly few cases in which deep learning (a machine learning technique that is currently the most dominant approach to AI) has achieved the transformations and improvements so often promised.


Why not?

To begin with, the laboratory evidence for the effectiveness of deep learning is not as sound as it might seem. Positive results, when machines using AI outdo their human counterparts, tend to get considerable media attention while negative results, when machines don’t do as well as humans, are rarely reported in academic journals and get even less media coverage.