AI laryngeal cancer detection: Can AI Detect Laryngeal Cancer Just by Hearing Your Voice?

In August 2025, researchers at Oregon Health & Science University demonstrated AI laryngeal cancer detection from voice alone—something that, even a decade ago, would have sounded like science fiction. Their model could tell the difference between healthy voices and voices belonging to people with vocal fold lesions, including early laryngeal cancer, from recordings alone. No scope. No biopsy. Just sound.

Laryngeal cancer kills around 100,000 people a year worldwide, and prognosis depends heavily on how early it is found. Five-year survival jumps from roughly 35% in advanced disease to about 78% when caught at an early stage. But the standard path to a definitive diagnosis still runs through an uncomfortable nasal endoscopy, often followed by a biopsy under anesthesia.

This article walks through how the AI actually works, what it can and cannot do today, and why this is part of a broader shift the field is starting to call “voice as a biomarker.” The clinical takeaway, framed from an ENT perspective, follows at the end.

AI laryngeal cancer detection illustration with sound waveform and neural network data

How a Tumor Changes the Sound of Your Voice

Voice is mechanical. Two small bands of tissue called vocal folds sit inside the larynx, and when you speak they vibrate roughly 100 to 250 times per second. Air from the lungs sets them in motion, and the resulting pressure waves become sound. Anything that disrupts that vibration—a polyp, a nodule, scarring, paralysis, or a tumor—shows up in the acoustic signal.

Three measurements matter most:

Jitter captures tiny pitch instability from cycle to cycle. A healthy voice has very low jitter; a damaged one wobbles.
Shimmer captures the same kind of instability, but in loudness rather than pitch.
Harmonic-to-noise ratio (HNR) measures how much of the sound is clean harmonic tone versus random noise—essentially, how “rough” or “breathy” the voice sounds.

Human ears miss most of these changes until they become severe. Software does not. A trained ENT ear can pick up the breathy, rough quality of a glottic lesion within seconds of a “say /a/” exam, but quantifying it consistently across thousands of patients is exactly the kind of pattern recognition machine learning handles well.

Waveform comparison: healthy voice with high harmonic-to-noise ratio versus voice with vocal fold lesion showing low HNR

What the 2025 OHSU Study Actually Did

The study, led by Phillip Jenkins and colleagues at Oregon Health & Science University and Portland State University, was published in Frontiers in Digital Health on August 12, 2025. It used the Bridge2AI-Voice dataset, part of the United States National Institutes of Health’s Bridge to Artificial Intelligence consortium—a project built specifically to develop voice as a clinical biomarker.

The team analyzed 12,523 voice recordings from 306 participants across North America. Some had been diagnosed with laryngeal cancer, others with benign vocal fold lesions like polyps or nodules, and others with non-lesion voice disorders such as spasmodic dysphonia or unilateral vocal fold paralysis. The AI compared multiple acoustic features and found that variation in the harmonic-to-noise ratio was the most useful single signal for separating cancer voices from benign-lesion voices and from healthy controls.

Two important caveats are worth stating clearly. First, the work is proof-of-principle, not a deployed clinical tool. Second, the model performed less reliably for women than for men—a known limitation the authors attribute to dataset size and acoustic differences between male and female phonation. Jenkins estimated that, with larger datasets and clinical validation, a usable triage tool might enter pilot testing “in the next couple of years.”

This Isn’t Actually the First Time

The 2025 headlines were largely framed as a breakthrough, but the underlying idea has a longer history—including a Korean one that English-language coverage missed.

In 2020, a team from Bucheon St. Mary’s Hospital (The Catholic University of Korea) and POSTECH published in Journal of Clinical Medicine among the earliest papers to apply a convolutional neural network specifically to distinguish laryngeal cancer voices from healthy controls. Using only a sustained /a:/ vowel, their 1D-CNN reached about 85% accuracy, outperforming human raters, including two trained laryngologists, on the same task.

The same group continued the work, and in 2024 published a follow-up in Scientific Reports extending the classifier to multiple laryngeal diseases including benign mucosal disease and vocal cord paralysis—essentially asking whether AI can distinguish cancer from the much larger pool of non-cancer voice problems, which is the harder and more clinically useful question.

What is genuinely new in 2025 is not the idea. It is the scale of curated multi-institutional data and a more honest reckoning with the demographic gaps in earlier work.

Will AI Replace the Scope?

The short answer is no, and replacement is probably not the right goal in the first place. Despite the promise of AI laryngeal cancer detection from voice, direct visualization remains essential.

Direct visualization of the larynx, whether by flexible endoscopy in clinic or rigid laryngoscopy under anesthesia, gives an ENT specialist information that voice cannot. It identifies the exact location, the surface appearance, the vascular pattern, and the relationship of a lesion to surrounding structures. A biopsy then tells you what the lesion is. Voice acoustics, however clever the analysis, cannot do any of those things.

What voice AI can plausibly do is reshape the upstream funnel. Most patients referred urgently for head and neck cancer evaluation turn out not to have cancer, which means a great deal of specialist time, equipment, and patient anxiety is spent ruling out disease that was never there. A voice-based triage layer—either at the primary-care level or through a phone-based screening tool—could push higher-risk voices to the front of the queue and lower-risk ones toward watchful waiting.

That is a smaller claim than “AI catches cancer.” It is also a more honest one.

Proposed voice-based triage workflow for laryngeal cancer detection: smartphone voice recording, AI risk screening, ENT laryngoscopy, biopsy

Clinical Perspective

For any ENT specialist, voice is the first piece of clinical information. A patient says hello, and within a few words there is already a working hypothesis—reflux irritation, a benign nodule from vocal abuse, a paralysis, or something more concerning. What no clinic ear can do is scale that intuition to the millions of people in rural regions, in low-resource settings, or in busy primary-care queues who never reach a specialist in time.

That is the gap voice AI might actually fill. Not by replacing the physical examination, but by widening the net of people who get one.

The other point worth being clear about: voice changes are common and most are not cancer. Reflux, allergy, recent illness, vocal overuse, neurologic disease, and even dehydration all affect acoustic measures. A future AI tool that flags “something off” still requires a clinician to interpret it. The risk of an overly enthusiastic consumer app is a flood of worried patients with normal exams, not missed cancers.

What This Means for You Today

Voice-based AI screening is not yet available as a consumer product. The honest answer is that the field is probably a few years out from anything reliable enough to rest a clinical decision on.

The rule that has held for decades has not changed. Hoarseness that lasts more than two to three weeks deserves an ENT evaluation, particularly in current or former smokers, people who drink heavily, and anyone over the age of 40. If your voice is your livelihood—singer, teacher, broadcaster, lawyer, religious leader—the threshold should be lower still. Do not wait for an app.

Key Takeaways

A 2025 Oregon Health & Science University study showed AI can distinguish vocal fold lesions, including early laryngeal cancer, from voice recordings alone—at least in men.
Variation in the harmonic-to-noise ratio was the most useful acoustic feature in the study.
The work builds on earlier Korean research from 2020 that applied convolutional neural networks to laryngeal cancer voice classification.
AI voice analysis is most realistically positioned as a triage tool, not a replacement for laryngoscopy or biopsy.
Persistent hoarseness lasting more than two to three weeks still warrants an in-person ENT evaluation regardless of any AI screening.

FAQ

Can AI detect laryngeal cancer from your voice?

In research settings, yes. The 2025 OHSU study used machine learning on more than 12,000 voice recordings to identify vocal fold lesions, including early laryngeal cancer, with promising accuracy in men. The 2020 Korean study at Catholic University of Korea and POSTECH reported approximately 85% accuracy using a 1D convolutional neural network on sustained vowel recordings.

How accurate is AI voice cancer detection?

Reported research-stage accuracy ranges from roughly 80% to 85% on test datasets, depending on the model and the task. Real-world clinical accuracy across diverse populations and recording conditions has not yet been established.

What is a vocal biomarker?

A vocal biomarker is a measurable acoustic feature in speech—such as pitch stability, amplitude variation, or harmonic-to-noise ratio—that correlates with a biological condition. Voice biomarkers are being studied for laryngeal cancer, Parkinson’s disease, depression, and several other conditions.

Will AI replace laryngoscopy for cancer diagnosis?

No. Voice analysis cannot localize a lesion, assess its surface or vascularity, or provide tissue for diagnosis. AI is most useful as a triage layer that decides who needs urgent laryngoscopy, not as a replacement for it.

When should I see an ENT for hoarseness?

If hoarseness lasts more than two to three weeks—especially in a smoker, heavy drinker, or someone over 40—schedule an ENT evaluation. Earlier evaluation is appropriate for anyone whose voice is essential to their work or whose hoarseness is accompanied by ear pain, swallowing difficulty, neck swelling, or unintentional weight loss.

References

Jenkins P, Harrison R, Bedrick S, Karstens L, Bensoussan Y, Hersh W. Voice as a biomarker: exploratory analysis for benign and malignant vocal fold lesions. Front Digit Health. 2025;7:1609811.
Kim H, Jeon J, Han YJ, Joo Y, Lee J, Lee S, Im S. Convolutional Neural Network Classifies Pathological Voice Change in Laryngeal Cancer with High Accuracy. J Clin Med. 2020;9(11):3415.
Kim HB, Song J, Park S, Lee YO. Classification of laryngeal diseases including laryngeal cancer, benign mucosal disease, and vocal cord paralysis by artificial intelligence using voice analysis. Sci Rep. 2024;14(1):9263.

Joonpyo Hong, MD is a board-certified otolaryngologist practicing in Korea. This article reflects his clinical interpretation of published research and does not constitute individual medical advice.

For more interesting contents:
https://curiousmd.com/neuralink-voice-speech-first/
https://curiousmd.com/cnn-laryngeal-cancer-diagnosis/
https://curiousmd.com/how-emotional-tts-works-ent-perspective/

Link out to:

OHSU 2025 study:
https://www.frontiersin.org/journals/digital-health/articles/10.3389/fdgth.2025.1609811/full)

NIH Bridge2AI program:
https://commonfund.nih.gov/bridge2ai