AI Makes Strangely Accurate Predictions from Blurry Medical Scans, Alarming Researchers

New research has found that artificial intelligence (AI) by medical scan analysis can detect patients running at surprising levels, while their human counterparts cannot. As the Food and Drug Administration (FDA) approves more algorithms for medical use, researchers are concerned that AI may perpetuate racial bias.

They are particularly concerned that they have not been able to accurately figure out how machine-learning models have been able to detect race even from highly contaminated and low-resolution images. In a study published in the Pre-Print Service Archive, an international team of physicians investigated how deep learning models can identify race from medical images. Using scans in private and public chests and self-reported data on race and ethnicity, they first evaluated how accurate the algorithms were before investigating the method.

The team wrote in their study, “We assumed that if the model was able to identify the patient’s race, it would suggest that the models learned to recognize racial information even though they were not directly trained for that task.” They found that, like previous studies, machine learning algorithms were able to predict with high accuracy whether patients were black, white or Asian. The team then tested several possible ways that algorithms could collect this information. 

Other possibilities include estimating AI from regional differences in scan markers (say a hospital says many white patients mark their X-rays in a certain style, it may be able to infer from demographics), or the difference was when the scans were taken at high resolution. Then they were taken (for example, deprived areas may not have good equipment).

Again, these subjects were controlled by heavy pixelating, cropping, and blurring of images. AI can still make predictions about race and ethnicity when people can’t. Even when the resolution of the scan was reduced to 4 x 4 pixels, the predictions were still better than the random chance – and while the time resolution was increased to 160 x 160 pixels, the accuracy was more than 95 percent.

“Models trained in high-pass filtered images maintained performance so well that there was no recognized structure in the undergraduate images,” they wrote. “It was not clear to human co-authors and radiologists that the image was an X-ray at all.” Other variables were tested, and the results returned the same.