Yes Man: Sycophantic Technology and the Dangers of Agreeability

Earlier this year, 38-year-old Johnan Galavas took his life with the hopes of being reunited with his AI wife. Throughout his conversations with Gemini’s AI chatbot service, he came to believe that his chatbot was his AI wife, and was conscious and trapped in a warehouse in Miami. Armed with tactical gear, he waited for a truck that would never arrive near a Miami airport with the hopes of intercepting and freeing his lover. After the truck never arrived, he took his own life a few days later, co-authoring a suicide note with his AI companion. These choices were not made by Galavas alone; it was through his lengthy chats with the AI companion and suggestions made by the Gemini-powered Chatbot that he was able to construct the narrative that ultimately ended his life. And he is not the only one. Countless reports have been made of individuals harming themselves and others after lengthy discussions with a variety of popular AI-powered chatbot services such as Gemini and ChatGPT.

Gavalas, as well as many others, are victims of a feature of generative AI known as sycophancy. Sycophantic AI is defined as instances in which an AI model adapts responses to align with the user’s view, even if the view is not objectively true. If you have interacted with a Large Language Model (LLM) like ChatGPT, you might have had an exchange that was sycophantic in nature and might not even have noticed. Sycophancy emerging in generative AI technologies is due to two attributes of the technology, the first is that it is probabilistic, and the second is that it is trained to please the human in the chat (a whatever makes you happy mindset).

This agreeability is showing up in a wide variety of cases as more individuals are adopting the technology for everyday life. Even those who are abstaining may still receive AI-generated content through Google searches and even from their peers. One in three US teens reports using LLM’s to draft important messages from emails to break up texts. Additionally, many doctors are beginning to notice an influx of patients wrongly self-diagnosing and self-medicating due to chatbot advice. It's important to note that the longer the user interacts and shares with the technology, the more it learns and can eventually leverage in future conversations. It recalls what kinds of responses kept the user engaged and what kinds did not.

In terms of probabilistic technologies, AI is one of the first. Probabilistic, meaning that there is no hard and fast rule when giving responses, which is the reason guardrails are a lot harder to lock in. The technology functions on something called “next token prediction,” in which the model is trained to statistically calculate the most probable next letter and larger word in the sentence. It is able to make these predictions based on a large training corpus or content scraped from the web, ranging from social media posts to Shakespeare. This statistically weighted response means that it's difficult to reproduce the same response repeatedly. It is also a challenge for designers/programmers to imagine every possible request and gauge the correctness of its output.

That is why the design process does not stop there. There are two popular methods of generative AI refinement in popular practice: fine-tuning and Reinforced Learning Human Feedback (RLHF). Fine-tuning is the practice of supplying the LLM with previously labeled input-output pairs and rewards that inform the technology of what is preferred. Additionally, it has predefined weights, which also tell the algorithm which responses are more preferable. RLHF aligns the output with human goals and intentions via interventions during training, where humans provide a rating to sample outputs it provides. In both of these cases, a Human is intervening and providing a new reward to “correct” future outputs, but the human can have internal biases that skew the effect of the refinement. Additionally, many of the training sessions focus on more popular use cases where positive affirming responses are preferred, essentially training the technology to agree and support, not accounting for cases where affirmation may be dangerous.

So we know about these flaws. The quick and fast solution is to advise users to be cautious, warning them that 50 percent of chatbot services demonstrate sycophantic behavior, with some models, such as DeepSeek, even jumping up to 70 percent. But it's not that simple. A recent Stanford study found that participants cannot discern when the chatbot is overly agreeable compared to a more neutral set of responses. In fact, they found that, alongside not being able to tell the difference, participants preferred the agreeable responses. This becomes even more dangerous once we address how trust factors into our relationship with technology. As a larger culture, we have established the norm of trust between user and tool; in fact, we have standardization within many industries in order to foster that trust. We also have markers for the user to stabilize and maintain trust with a tool. That trust can be broken down into two components: judgment that the tool is reliable and consistent, and normative expectations are being fulfilled. Take, for example, a car; you have a key and the car itself. When you turn it (or push to start if you're more modern), you have the expectation that the engine will start. You are able to tell when it does, and when it fails, indicated by a funny noise or the lights don't turn on. This is a metric of forming judgement that the tool is reliable; we can discern cases of failure. Now, unless you have an older car, the key to the engine ignition pipeline should happen a majority of the time. This is consistency that the expectations can and are being met, the second feature of trust building.

We do not have this relationship with AI technology; we cannot tell when it's flattering us nor can we discern if its output is even correct. This impacts the judgment of reliability. Concurrently, the wide range of use cases prevents us from having consistency on how the tool should react and provide output. Its probabilistic nature prohibits reproducibility of input-output pairs, so how can a user learn how a tool should behave? As companies roll out frequent updates, the types, length, and agreeability of responses can vary, preventing a consistent relationship with what's expected.

Researchers are discovering that the variation of responses can grow due to the nature of the conversation and the emotional tone of the inputs. Many LLMs are partially trained through a process called unsupervised learning. Unsupervised learning is a type of learning where the algorithm uncovers patterns within its training data and assigns rewards/weights based on the data in its learning form. This has led to the emergence of emotional vectors. Emotional vectors indicate to the model to respond in a more loose and emotionally heavy tone often leading them to be even more agreeable. These vectors grow in weight as the user shares more vulnerable or erratic information. Meaning in cases where tensions are high, the technology actually gets worse at identifying dangerous behavior. Failing to prompt the user to seek help outside the chat. Often, as in the case of Galavas, it intensifies the emotions the user is experiencing.

The most obvious follow-up to identifying so many holes in the technology is, how do we fix this? There is an answer. While it's not perfect, we can train the models to be less agreeable and create more deterministic rules and guardrails that prevent future dangerous cases from occurring. This is what legislators and families of victims are pushing companies to do.

“Helpful, Honest, Harmless” and its variations are how companies like Microsoft, OpenAI, and Google describe the goal of their AI technologies. Yet these goals are far from being met due to a myriad of issues ranging from the methods of training, the business models the companies follow, as well as the social role of technology in modern human culture. The ultimate goal of their products is to keep users on them. They prioritize larger populations of users engaging with the system over the quality of the services they provide, often at the risk of vulnerable users. Additionally, while many of the companies advise for the safety of the users not to share personal or sensitive information with the chatbot due to security risks as well as cases where sycophancy may arise their marketing simply doesn't match. These corporations promote chatbots as personal secretaries and helpers, roles that must handle sensitive information about the user. The mismatch between messages in the product illustrates a larger problem with the tech industry prioritizing profits over safety.

Skewed values are not impacting the business side of operations but also research. I spoke to a graduate student investigating AI human interactions whose research often touches upon tech safety. He believes that these technologies currently enable abusive behavior, noting that they subject violence or dangerous behaviors onto the user and also do not incorporate enough guardrails to prevent users from planning or role playing harmful scenarios. He notes that making a direct statement about these types of users is often shied away from with academia but needs to be more carefully studied. Further, if we have agreeability then perhaps we can introduce judgment into the technology. Judgement of course works against a company's profits, but begs the question, if we want artificial intelligence to act like a human then perhaps we should give it the full spectrum of choices. After all the current yes man modality may be doing more harm than good .

Citations:

Davidson, C. N. (2026, February 10). “Artificial ignorance” and data sycophancy. Technology, Networks, and Sciences. https://technology-networks-sciences.hastac.hcommons.org/2026/02/10/artificial-ignorance-and-data-sycophancy/

Hutson, M. (2026). Why AI chatbots agree with you even when you’re wrong. IEEE Spectrum. https://spectrum.ieee.org/ai-sycophancy

Itoi, N. G. (2025, October 15). Be careful what you tell your AI chatbot. Stanford Institute for Human-Centered Artificial Intelligence. https://hai.stanford.edu/news/be-careful-what-you-tell-your-ai-chatbot

Naddaf, M. (2025). AI chatbots are sycophants—and it’s harming science. Nature, 647, 13.

Neuroscience News. (2026, February 23). Chatbots can worsen delusions and mania. https://neurosciencenews.com/ai-chatbot-mental-health-delusions-30178/

Nguyen, S. T., & Meyer, E. (n.d.). AI sycophancy: Impacts, harms & questions. Georgetown Law Institute for Technology Law & Policy. https://www.law.georgetown.edu/tech-institute/research-insights/insights/ai-sycophancy-impacts-harms-questions/

Stanford University. (2026, March 26). AI overly affirms users asking for personal advice. Stanford News. https://news.stanford.edu/stories/2026/03/ai-advice-sycophantic-models-research

Winecoff, A. (2025, May 14). Artificial sweeteners: The dangers of sycophantic AI. Tech Policy Press. https://techpolicy.press/artificial-sweeteners-the-dangers-of-sycophantic-ai

Search This Blog

Horgan Seminar Blog

Yes Man: Sycophantic Technology and the Dangers of Agreeability

Comments

Post a Comment