I Know What You're Thinking...
Jack Caputo
2 May 2026
HST401
I pledge my Honor that I have abided by the Stevens Honor System
I Know What You're Thinking...
Regular folks have been harnessing the power of artificial intelligence (AI), mostly in the form of large-language models (LLMs), since it took over the world a few years ago. People are being fired and replaced with AI, students are outsourcing all their work to it, and AI slop has flowed into every crevice of the Internet. People have formed an idea of what AI can do, or perhaps all it can do. Of course, machine learning algorithms were implemented years before the all-powerful LLMs that have come to dominate the market and cultural consciousness. Smaller machine learning models were created to be able to pick out patterns in data more reliably than humans or traditional data analysis methods, or perhaps patterns they couldn’t see at all. The field has matured at astounding rates – but what’s next?
Well, what if I told you that this same underlying technology can be used to read your mind? For my senior project here at Stevens Institute of Technology, I’m working with another student to visually reconstruct, or reproduce what a person sees, using machine learning and electroencephalogram (EEG) data, a type of brain data.
Let’s start from the beginning: how might I get your brain data? The broadest division of brain imaging techniques is invasive and noninvasive. Invasive, as the name implies, requires a surgery and an implant that goes directly into the brain. This might be brain chips that are pushed in to your squishy lumps (professionally, of course), or an implant that lives in your brain’s blood vessels. This is the approach of the corporate headliners who are invested in brain-computer interfaces, or BCIs: Neuralink, Blackrock Neurotech, Synchron. Compare these to common noninvasive techniques, such as EEG, magnetoencephalography (MEG), and functional magnetic resonance imaging (fMRI), which may not have as good data resolution as the invasive techniques (depending on the use case), but make up for this with their lower barriers to entry: participants don’t need surgery, and thus don’t risk infections or complications from the surgery, and they are comparatively cheaper after considering the surgery, operation, and maintenance costs of invasive methods. EEG uses electrodes placed on the scalp to record the changes in electric potential that occurs when neurons activate. It is particularly appealing for BCIs due to its low cost, even compared to the other noninvasive methods, plus its portability and high temporal resolution. Its main weakness is its low spatial resolution, meaning it cannot effectivity record which areas in the brain turn on in response to specific stimuli; this is one great strength of fMRI.
CW from bottom: fMRI (https://icord.org/studies/2013/04/functional-magnetic-resonance-imaging-fmri-for-people-with-sci/), EEG (https://www.brainproducts.com/solutions/actichamp/), MEG (https://neurology.ufl.edu/meg/)
Now that I’ve gotten your EEG data, how do I go about recreating what you saw? Other groups have done reconstructions by first classifying the object that was seen, then using this “label” to generate an image, typically with a diffusion model, the same framework behind AI image generation. This often leads to the reconstructions seen in Fig. 1, where the reconstructed objects agree with what the person saw, or the ground truth (GT), but display varying orientations, colors, background environments, etc. When planning our project, our main goal was to avoid this type of reconstruction, and instead make reconstructions as faithful to the ground truth as possible, even if we had to sacrifice overall image fidelity.
Our paradigm is shown in Fig. 2 below. Let’s start at the top: we first take the ground truth image and feed it into a variational autoencoder (VAE), a type of machine learning model that takes in an image, encodes its important features in a latent space, and does sampling from this latent space to create an image as similar as possible to the input. The latent space contains whatever the model has decided are the most important features of the input image. Everything here is learned by the model – how to encode specific images, what the latent space looks like for an image, and how to properly sample from the latent space. VAEs are commonly used for image generation tasks, and balance model simplicity with output quality, making it a perfect choice for our project.
Fig. 2 Model architecture
Now, on to the EEG data, which consists of two pipelines. First, the EEG data goes through a similar treatment as the image through the VAE: EEG data is encoded into its own latent space, then decoded into an image. Creating and training the EEG encoder was one of the main results of our project. If we align the EEG latent space with the image latent space so that they represent the same things (yellow arrows), then in principle if we use the same decoder as in the VAE (green arrows), we should get the same reconstructed image as from the VAE. In practice, EEG’s low spatial resolution means it simply cannot produce such high-quality images on its own; we are left only with blobs, but these blobs still contain some low-level image features, like color and vague shape. The second EEG pipeline consists of a classifier, which my partner and I created for a project last spring. The classifier determines what object from the dataset the EEG data represents, giving us a “class label”. Now, armed with low-level image details and a class label, we feed these into a diffusion model.
For low VAE latent dimensions (n=2, 4, 8), the encoder was able to represent meaningful low-level information. Quantified with structural similarity index measure (SSIM), reconstructions from the low-level image + label were more accurate to the GT than reconstructions conditioned with label only.
Fig. 3 Results. SSIM in all cases is higher in low-level image + label than label only
While we wished to create a model that didn’t use diffusion and so wouldn’t fill in the background, we unfortunately did not find a method that worked given our time constraints and level of knowledge. However, we still struck a balance between realistic and high-quality reconstructions, showing that low-level visual features can be extracted from EEG data and used to improve reconstruction accuracy.
While we’re proud of our project, it doesn’t come without its weaknesses. As mentioned, EEG data lacks the spatial resolution to generate high-quality images without some form of semantic guidance. Combining EEG + fMRI is being explored by research groups and could eliminate the need for diffusion. Our current model is also heavily reliant on the classifier, as Stable Diffusion will always generate an image, even if the class label is incorrect. There are also the more general issues of small datasets, intersubject variability, and processing delays for real-world usage.
What, then, is the state-of-the-art in visual reconstruction? The most notable group I discovered through my research has created a paradigm called Brain-IT to do visual reconstruction with fMRI data to produce high-quality images, as shown in Fig. 4 [2]. While the reconstructions aren’t exactly perfect, they’re damn close – it even gets details such as the angles of the snowboarder’s elbows and the batter’s stance nearly perfect, and is a far cry from diffusion’s characteristic object-with-a-random-background. This is even possible thanks to fMRI’s high spatial resolution. This is an incredible development, but there is much more research happening in BCIs than just visually reconstructing images. In 2023, researchers decoded from attempted speech – that is, they retrieved the words a participant was thinking about – at 62 words per minute [3]. Another group made first steps towards continuous visual reconstruction (i.e., video) with EEG in 2025 [4]. There are folks working to combine different brain imaging techniques to preserve the strengths of each type [5, 6]. For example, by aligning EEG and fMRI to represent the same things, one could use both forms of data at once, benefitting from the high spatial resolution of fMRI and temporal resolution of EEG. This increased temporal resolution is vital for real-time deployment of BCIs. Finally, there’s a group that published in February 2026 which, similar to us, is working on visual reconstruction with EEG, but in a better and more sophisticated way, including an advanced diffusion framework [7]. All of this research could be applied for more practical deployments, such as autonomous control of prosthetics or even artificial sensations [8].
Fig. 4 Brain-IT’s incredible visual reconstructions using fMRI [2]
With all this work in BCIs and visual reconstruction, it seems like we’re incredibly close to breakthroughs in this technology. Make no mistake – this development would be monumental. Once we have good quality reconstructions, how far are we from using this technology for mind reading or remote control technologies? As is often the case, those working in the field become overexcited and lose track of exactly how much more work needs to be done before the technology will see real world use, and are often incentivized to “hype up” developments. With the infallible wisdom of a college senior who has only done two projects in this field over a year and a half, I’d say we have another decade before BCIs and EEG become more widespread in research and medical settings, a good 20 years before they’re remotely feasible to use outside of research/medical settings (figure out compatibility of technologies, charging your EEG, etc.), and at least 30 years before things hit the legislature, which might deal with the responsible use of such technologies, how wider deployment might be handled, and regulation of the companies involved.
For now, BCI research is still incredibly expensive due to operation and maintenance costs on these machines, plus the research trials themselves; this is doubly true for invasive BCIs. And even if the finances were of no consequence, there are a very limited amount of trial participants who are even eligible to contribute to this research. Trials still focus on single tasks, not combinations, and these neuroimaging devices are rarely used outside of the labs. Even those with BCI implants can’t use them at home, and so a large part of their freedom and expression is held in limbo by these trials [8].
An even more concerning pattern is that much of the present advocation and financial backing for BCIs comes from tech billionaires, with twisted motives of power and mind control. I don’t need to slap an EEG on Elon Musk, Sam Altman, and the other uber-wealthy deluded tech bros and read their brains to understand that they care very little about the virtuous applications of BCIs, namely helping those with complete paralysis to again interact with their environment and communicate autonomously. Besides mind reading, they would love total control of information, perhaps for surveillance purposes. Maybe as a truth serum, or to be used in torture, as the US army is also interested in this tech – can one effectively hide one’s thoughts from being forcefully accessed by someone else? The next step would probably be mind control – remote control of robots and other devices with your BCI. And after we understand the mind enough to reliably extract information from it and alter the external world with it, isn’t the next logical step to artificially alter the brain, dosing up and down whenever we want? This all reads like classic sci-fi – but the characters are real, with real names and faces, and the technology is rapidly becoming real.
Should we even continue researching BCIs and related neurotechnology? I, for one, say we should. A recent article from IEEE Spectrum introduces readers to living with an invasive BCI and includes some very important and emotional perspectives [8]. Austin Beggin is able to pet his dog with his hands again. Casey Harrell regained his ability to speak and can even hold conversations using his BCI; when he first successfully spoke again, he wept. Scott Imbrie still gets goosebumps remembering the tactile feedback from the first time using his robotic hand to shake someone else’s. This is not AI being deployed to optimize spreadsheets or increase shareholder value; this technology benefits regular folks, especially those with illnesses or disadvantageous conditions. It irrefutably improves human lives.
BCIs also drive forward progress in neuroscience more broadly. New techniques allow us to study the brain in different ways. Take dreams, for example. Both dreams and visual stimuli activate regions in the occipital lobe, meaning visual reconstruction could also be dream reconstruction. BCIs, if fully realized, will be used in ways we can’t yet imagine. This technology would change the way we think about and use our brains, the very basis of our cognition, and so I am confident that, after a point, discoveries and advancements will start compounding.
I can never feel one emotion when thinking about this topic. I am satisfied that I (somewhat) understand the theory behind the machine learning in my visual reconstruction project. I am excited when I read research papers and news articles learning about the incredible work that very smart, honest people are doing. I am fearful that there are powerful, influential people that will end up getting what they want anyways and will continue to uproot modern society faster than we can understand how to respond. I find it spooky and unnerving, talking about fundamentally understanding the brain and tapping in to new ways of using it. But when I think about the people this technology helps, when I consider the humanitarian side, I feel hopeful, invigorated, proud.
Corporations continued to upgrade their AI’s without questioning if they should. This field must be different. We must make sure we are advancing progress in BCIs for the right reasons. And there are undeniably good reasons – noble researchers are helping disadvantaged individuals live better lives; I can find no fault in that. Unlike with AI, we must proceed with responsibility and fallibility. But there’s no need to worry – with researchers like my project partner and I, the future is in good hands.
Two competent researchers doing very serious work on the nature of brain imaging and visual decoding, and totally not staging a photo shoot for their project submission.
Works Cited
[2] Brain-IT: Image Reconstruction from fMRI via Brain-Interaction Transformer, https://arxiv.org/abs/2510.25976v1
[3] A high-performance speech neuroprosthesis, https://pubmed.ncbi.nlm.nih.gov/37612500/
[4] DynaMind: Reconstructing Dynamic Visual Scenes from EEG by Aligning Temporal Dynamics and Multimodal Semantics to Guided Diffusion, https://arxiv.org/abs/2509.01177
[5] Research on fMRI Image Generation from EEG Signals Based on Diffusion Models, https://doi.org/10.3390/electronics14224432
[6] Towards neural foundation models for vision: Aligning EEG, MEG, and fMRI representations for decoding, encoding, and modality conversion, https://doi.org/10.1016/j.inffus.2025.103650
[7] NeuroVision: EEG-to-image reconstruction via progressive neural encoding and cross-modal distillation, https://doi.org/10.1016/j.eswa.2026.131526
[8] What It’s Like to Live With an Experimental Brain Implant, https://spectrum.ieee.org/bci-user-experience
Comments
Post a Comment