The Risks of Using OpenAI's Whisper in Medical Settings

In 2022, OpenAI introduced Whisper, a transcription tool that promised "human level robustness" in converting audio to text. Despite these claims, an Associated Press investigation revealed significant issues with the tool last Saturday, including occurrences of "confabulation" or "hallucination," where the AI invents text that wasn't actually spoken. This happens in both medical and business contexts.

Hospitals' Controversial Use of Whisper

Despite warnings from OpenAI against deploying Whisper in "high-risk domains" like healthcare, more than 30,000 medical professionals currently use software based on this tool to transcribe patient visits. Some healthcare systems, including Minnesota's Mankato Clinic and Children’s Hospital Los Angeles, have integrated a Whisper-powered AI service developed by Nabla. Although this tool is tuned for medical terminology, Whisper's tendency to create false information poses substantial risks.

Timelines traced back to research by the University of Michigan show false text appeared in 80 percent of public meeting transcripts reviewed. One developer pointed out almost every one of his 26,000 transcriptions included invented content, creating potential impact areas. Notably affected are medical fields, where accuracy is paramount, and for deaf patients reliant on transcripts, the risk of error remains high.

Nabla acknowledges Whisper's issues, yet also reportedly deletes original audio recordings to maintain data safety. This presents further complications as it prevents verification of the AI's accuracy compared to the original speech.

Broader Implications Beyond Healthcare

The problem of AI hallucinations isn't confined to healthcare. Research from Cornell University and the University of Virginia highlighted Whisper's errors in 1 percent of analyzed audio samples. These included entirely fictional violent or racial additions to the transcripts, presenting "explicit harm," fabricating false narratives.

Among these, a recorded instance noted involved Whisper adding racial descriptors where none were present, or crafting a violent scene from innocuous dialogue. OpenAI has responded by stating they are committed to refining Whisper through ongoing research and updates, thanks to external feedback.

Understanding AI Hallucination Phenomena

The root cause of Whisper's inaccuracies lies in the nature of AI. Based on a Transformer deep learning architecture, Whisper, much like other AI like ChatGPT, operates by predicting the most likely sequence of data. In Whisper’s case, it processes tokenized audio data, aiming to predict text sequences. This predictive behavior explains why such models are prone to hallucinations when confounding confidence with actual veracity.

For further information, you can read more on Wired.

Next Post Previous Post