Neuralink VOICE: Why Speech, and Not Sight, Came First

Speech is not a sense that was restored. It is a motor act that was decoded.

In late March 2026, Neuralink released a video. Kenneth Shock, a man with amyotrophic lateral sclerosis (ALS) who had not spoken in four years, said a sentence. He did not move his lips. He did not exhale. He did not produce sound. The sound came from a computer, in a voice reconstructed from recordings of him made in 2020, before the disease silenced him. Shock is the second participant in Neuralink’s VOICE trial, registered on ClinicalTrials.gov as NCT07224256, and the most publicly visible patient in a study that, at the time of writing, has enrolled two.

The headlines used a familiar phrase: mind reading. The phrase is wrong. Nothing about the brain’s broader thought stream was read. What was decoded was the motor cortex’s outbound instruction to muscles that no longer obeyed it — the same kind of signal that, in a healthy person, becomes a moving tongue and a vibrating larynx.

That category — motor output, not sensory experience — is the single most important fact about why this trial exists, and why a comparable trial for restored sight does not. The reason Neuralink’s first restorative milestone is voice rather than vision is hidden in the asymmetry between reading from the brain and writing to it.

What follows is a clinical reading of what Neuralink VOICE actually does, what it shares with prior speech BCI work, and what its first claims do not yet prove.


The VOICE trial, in plain numbers

VOICE is registered as An Early Feasibility Study of a Precise Robotically Implanted Brain-Computer Interface for Communication Restoration. The full title matters: early feasibility is a regulatory category, not a marketing one. It means the device’s safety and preliminary effect are being explored in a small number of patients to inform later, larger trials. The estimated enrollment is six. The site is The University of Texas Southwestern Medical Center. The primary completion date is October 2028.

The enrolled population is narrowly defined. Adults aged 22 to 75, with severe and irreversible speech impairment plus impaired upper-limb function, secondary to a defined list of central nervous system disorders: ALS, primary lateral sclerosis, stroke, or cervical spinal cord injury. Life expectancy must be at least twelve months. The intervention is the N1 Implant, placed by the R1 Robot.

The U.S. Food and Drug Administration (FDA) Breakthrough Device Designation granted to Neuralink in May 2025 for the speech indication is frequently misread. Breakthrough Designation accelerates review timelines. It is not a marketing authorization. As of mid-2026, no Neuralink product is approved by the FDA for clinical sale; everything that exists exists under an Investigational Device Exemption.

The number that matters most is also the simplest: two. As of this writing, two participants have received an N1 Implant specifically under the VOICE protocol. One of them is Kenneth Shock. Public information about the other is limited.

Clinical Perspective. Two enrolled patients is not a treatment. It is a hypothesis under test. The clinician’s task when reading coverage of VOICE is to translate every “Neuralink can now” into “Neuralink has shown, in a patient, that it can sometimes.” Until larger trials replicate, the device’s claims are case reports with engineering ambition.


Speech is not a sense. It is a motor act.

The crux: speech as we perceive it is auditory, but speech as we produce it is motor.

Hearing a word is sensory: sound waves enter the cochlea, the auditory nerve carries them, the auditory cortex interprets them. Saying a word is motor: the brain coordinates the diaphragm, larynx, tongue, and lips in a sequence of muscle contractions whose acoustic byproduct is speech. The motor cortex does not store sounds. It stores instructions to move the apparatus that makes sound.

The distinction matters because brain-computer interfaces (BCIs) come in two fundamentally different flavors: those that record from the nervous system and those that stimulate it. A speech BCI records from the motor cortex. A vision BCI must stimulate the visual cortex. These are not engineering variants. They are different problems.

Recording is, by orders of magnitude, the easier problem. The brain produces its own signals; the device only needs to listen, amplify, and decode. The clinical risks are concentrated in the surgery and the long-term tissue response, not in the operation of the implant itself. Stimulation, by contrast, requires the device to inject patterned electrical current into living cortex with enough resolution to evoke a coherent percept. The risk profile expands: seizure threshold, charge density limits, current shunting, habituation, drift in stimulation parameters over months.

This asymmetry explains the entire trajectory of clinical neuroprosthetics. The first clinically successful neuroprosthesis — the cochlear implant, in 1957 — bypassed the cortex entirely and stimulated the auditory nerve, where the encoding problem is comparatively simple. The first speech BCIs published in New England Journal of Medicine (NEJM)-grade trials began with anarthric brainstem stroke and ALS patients [Moses, Neuroprosthesis for Decoding Speech, 2021] and continued through intracortical work that pushed decoding to 62 words per minute [Willett, A High-Performance Speech Neuroprosthesis, 2023] and conversation-rate use over months [Card, An Accurate and Rapidly Calibrating Speech Neuroprosthesis, 2024]. None of those papers required stimulation of cortex. All of them recorded from the motor system.

Neuralink’s own Blindsight program, aimed at restoring vision, is years behind the speech work on the same hardware platform. It has to do what speech restoration does not: write to the brain. The same N1 chip that records speech intent today cannot, by itself, paint usable visual experience tomorrow.

Clinical Perspective. When a patient asks whether the technology that “gave a man back his voice” could also restore lost vision, the honest answer is: not yet, and not soon, because they are different problems wearing the same brand name. Restoration of input is much harder than interception of output. Conflating the two — as much public coverage does — sets expectations the field cannot meet.


Where the threads actually go

The N1 Implant is a small, skull-mounted, wireless, rechargeable device connected to fine polyimide threads carrying electrodes. The publicly described specification of the current platform is 1,024 electrodes distributed across 64 threads, with stated next-generation densities reaching 3,072 electrodes across 96 threads. Each thread is thinner than a human hair and must be placed by a purpose-built robot, the R1.

The surgical sequence is familiar to any neurosurgeon. A preoperative functional magnetic resonance imaging (fMRI) study localizes the cortical territory of interest. The scalp is incised. A small craniectomy is made. The dura mater is opened — though Neuralink has indicated that future versions of the procedure will insert threads through intact dura. The N1 device sits in the skull defect, electrode threads trailing into cortex. R1 inserts each thread several millimeters deep, navigating the cortical vasculature visible on the surface. Closure follows.

The target territory for speech is the ventral precentral gyrus, the part of the motor cortex that controls the muscles of articulation. The independent academic record on this is unambiguous. The 2024 NEJM trial of an intracortical speech neuroprosthesis placed four microelectrode arrays into the left ventral precentral gyrus of a participant with ALS, recording from 256 electrodes [Card, 2024]. The Stanford trial decoded attempted speech from microelectrode arrays in the same general region, with the strongest decoding signal localized to area 6v [Willett, 2023]. UCSF’s high-density surface recordings sit on the speech cortex as well [Metzger, A High-Performance Neuroprosthesis for Speech Decoding and Avatar Control, 2023]. Neuralink’s placement is consistent with this lineage.

Diagram of the lateral surface of the human cerebrum highlighting the ventral precentral gyrus as the cortical territory targeted by speech brain-computer interfaces, with an inset showing the motor homunculus region devoted to face, lips, tongue, and larynx.

The depth matters. Surface recordings — electrocorticography (ECoG) — detect aggregate field potentials from large neural populations. Intracortical penetrating electrodes — the Neuralink approach — record the spiking activity of small groups of individual neurons. The latter offers higher information density per electrode at the cost of greater invasiveness and greater long-term variability.

One specific point clinicians should hold onto: even in a patient who has not spoken intelligibly for years, the articulatory representation of phonemes in the motor cortex appears to persist. The neural map of how to say a word is still there, even when the muscles no longer do what the map asks of them [Willett, 2023]. This is the central piece of biology that makes the entire enterprise possible.

Clinical Perspective. The most informative phrase in the speech BCI literature is attempted speech. The patient still tries. The motor cortex still produces instructions. The disease has cut the wire downstream; the BCI taps in upstream. From this angle, the device is closer to a peripheral nerve bypass than to a thought reader.


From spoken to mouthed to imagined

The training protocol Neuralink has publicly described follows a three-stage progression for a patient like Kenneth Shock, whose ALS has progressively eliminated voluntary speech production but not the cortical intent to speak.

In stage one, the patient speaks aloud, or attempts to, while the system records the corresponding cortical activity. The acoustic output, where it exists, serves as a ground truth for what the patient was trying to say. The machine learning model learns the mapping between neural firing patterns and the words being produced.

In stage two, the patient mouths the words silently, without phonation. The motor cortex still produces the same fundamental commands. The decoder, now trained on overt speech, must hold its accuracy when the only available signal is neural.

In stage three, the patient does not move at all. They imagine speaking. The decoder must produce text — and, downstream, synthesized voice — from neural activity alone.

Schematic of the three-stage training protocol used in speech brain-computer interfaces, progressing from spoken speech to silently mouthed speech to imagined speech, with arrows showing the decoder's transition from acoustic ground truth to purely neural input.

The output side of the pipeline maps neural patterns onto phonemes, the smallest discrete units of speech sound. English has 39. Phonemes are assembled into words by a language model, words into sentences by a second model, and sentences finally rendered as audio by text-to-speech software. In Kenneth Shock’s case the synthesized voice was reconstructed from recordings of him made in 2020, using artificial intelligence (AI) voice-cloning. His wife refers to it as Original Ken.

This is where speech BCI becomes a tractable engineering problem. Phonemes give the system a discrete, finite, well-defined output vocabulary. Vision does not enjoy this property; nor does olfaction. The decoder’s job is classification at scale, not synthesis from scratch. Bolted onto modern language models, it benefits from priors the field could not have used a decade ago: a sentence whose neural decoding is uncertain can be resolved by the language model’s expectation of what plausibly comes next.

Published performance benchmarks from non-Neuralink intracortical speech BCIs anchor the realistic ceiling. The Stanford participant achieved 62 words per minute with a 9.1% word error rate over a 50-word vocabulary and 23.8% over 125,000 words [Willett, 2023]. The 2024 NEJM trial sustained 97.5% accuracy in conversational use over more than eight months, communicating at roughly 32 words per minute across 248 cumulative hours [Card, 2024]. UCSF’s surface-recording system reached a median 78 words per minute with 25% word error [Metzger, 2023].

Eye-tracking — the current best non-BCI alternative for patients who retain ocular control — operates at roughly 10 to 20 words per minute. Natural conversational speech runs around 150 to 160. The gap from eye-tracking to BCI is meaningful. The gap from BCI to conversational speech is still real.

ModalityThroughput (words / min)
Eye-tracking augmentative communication10–20
Intracortical speech BCI [Card, 2024]~32 (conversation)
Intracortical speech BCI [Willett, 2023]62 (50-word vocab)
ECoG speech BCI [Metzger, 2023]78 (median, large vocab)
Natural conversational speech~150–160

Clinical Perspective. Patients facing progressive loss of speech rarely receive precise numbers about what awaits them. The figure that matters in counseling is not the headline word-per-minute peak but the floor: how slow communication becomes when eye control finally fails. For some ALS patients in late disease, that floor is silence. A speech BCI sets a different floor.


What this trial does not yet prove

The most important sentence in this entire piece may be this one: VOICE has implanted two patients. Two.

Early feasibility studies are designed to make safety problems visible at small numbers. Neuralink’s larger PRIME program has already produced one such problem. The first PRIME participant, Noland Arbaugh, experienced retraction of the great majority of his electrode threads from cortex in the weeks after surgery; an estimated 85% became non-functional, leaving roughly 9 or 10 of an original 64 in place. The team adapted around the loss with software, but the underlying biological problem — a fine foreign body lodged in soft tissue that pulsates, swells, and reorganizes — remains.

What this means clinically is that the VOICE trial’s most important endpoints are not the demonstration videos. They are the durability of the recorded signal at three, six, and twelve months, the rate of revision surgery, the cumulative tissue response, and the management of any device-related infection. The publicly visible I am talking to you with my mind sentence is real. So are the latencies and accuracy ceilings the engineers are working against.

The patient population is also self-selecting in ways that matter. VOICE excludes patients with prior intracortical electrodes, certain skull anatomies, uncontrolled seizures, and a list of other features designed to maximize the chance the device performs as intended. The eventual population of patients who might benefit — late-stage ALS, brain-stem stroke survivors, certain spinal cord injuries — is broader and less forgiving than the trial’s entry criteria.

There is also the question of whether the imagined-speech stage will hold up over time. Inner speech and attempted speech are not perfectly equivalent neural events. Recent work has begun to dissect that difference: inner speech is represented in the motor cortex with patterns strongly correlated with, but distinguishable from, attempted speech [Kunz, Inner Speech in Motor Cortex, 2025]. A system that decodes attempted speech robustly may decode purely internal speech less robustly, particularly as the disease progresses and voluntary motor effort fades.

Clinical Perspective. A first-in-human device is, by definition, an artifact whose long-term performance is unknown. The honest framing for ALS patients and their families considering BCI is that the technology has moved from impossible to early. Early is not the same as available, and available will not be the same as durable. Watch the durability data — that is where this story will be made or unmade.

Treated as a small step on a long ladder, VOICE is significant. Treated as the arrival of restored speech, it overshoots what two patients can carry.


Key Takeaways

  • VOICE (NCT07224256) is an early feasibility study with an estimated enrollment of six, currently at two implanted patients, and a primary completion date in 2028.
  • The reason speech, not vision, came first is structural: recording from the motor cortex is a fundamentally easier engineering problem than stimulating the sensory cortex.
  • The ventral precentral gyrus, not an abstract “thought center,” is the cortical territory speech BCIs read; the device intercepts motor commands the disease can no longer deliver.
  • The standard training protocol moves a patient from spoken speech to silently mouthed speech to imagined speech, with phoneme classification serving as the decoder’s discrete output.
  • Published benchmarks from related intracortical and ECoG trials place 32–78 words per minute as a realistic working range, far above eye-tracking’s 10–20 and well below conversational speech.
  • The durability of the recorded neural signal, not the demonstration sentence, is the endpoint that determines whether a demonstration becomes a treatment.

FAQ

Does the Neuralink VOICE implant read thoughts? No. It decodes neural activity in the motor cortex that would otherwise drive the muscles of speech. The implant detects the brain’s instructions to move the tongue, lips, and larynx — not abstract thought. Even the “imagined speech” stage reflects an attempted speech motor command, not unrestricted inner monologue [Kunz, 2025].

Is Neuralink VOICE approved by the FDA? No. The N1 Implant for speech restoration has received FDA Breakthrough Device Designation, which accelerates regulatory review. It is not the same as marketing authorization. As of mid-2026, the device remains investigational under an Investigational Device Exemption, and the VOICE study is an early feasibility trial.

Why was sight not restored first? Because restoring sight requires stimulating cortex, not recording from it, and stimulation is a much harder problem. Speech BCIs intercept motor signals the brain already produces. Vision BCIs would have to inject patterned electrical input into the visual cortex at a resolution and stability the field has not yet demonstrated. Same hardware platform; different problem class.

How fast can a patient communicate using a speech BCI? Roughly 30 to 80 words per minute in the best published trials, depending on the system and vocabulary [Willett, 2023; Card, 2024; Metzger, 2023]. That is meaningfully faster than eye-tracking (10–20 words per minute) and slower than natural conversation (around 150). Most current ALS patients reach the BCI ceiling well before they reach the speed of unimpaired speech.


References

  1. Moses DA, Metzger SL, Liu JR, et al. Neuroprosthesis for decoding speech in a paralyzed person with anarthria. N Engl J Med. 2021;385(3):217-227.
  2. Willett FR, Kunz EM, Fan C, et al. A high-performance speech neuroprosthesis. Nature. 2023;620(7976):1031-1036.
  3. Metzger SL, Littlejohn KT, Silva AB, et al. A high-performance neuroprosthesis for speech decoding and avatar control. Nature. 2023;620(7976):1037-1046.
  4. Card NS, Wairagkar M, Iacobacci C, et al. An accurate and rapidly calibrating speech neuroprosthesis. N Engl J Med. 2024;391(7):609-618.
  5. Kunz EM, Abramovich Krasa B, Kamdar F, et al. Inner speech in motor cortex and implications for speech neuroprostheses. Cell. 2025;188(17):4658-4673.e17.
  6. ClinicalTrials.gov. NCT07224256: VOICE — An Early Feasibility Study of a Precise Robotically Implanted Brain-Computer Interface for Communication Restoration. Sponsored by Neuralink Corp. Available at: https://clinicaltrials.gov/study/NCT07224256
  7. Neuralink. Speech Restoration. Available at: https://neuralink.com/trials/speech-restoration/

Joonpyo Hong, MD is a board-certified otolaryngologist practicing in Korea. This article reflects his clinical interpretation of published research and does not constitute individual medical advice.

This article is not intended to advertise or promote any specific company or product.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top