Psychedelics vs antidepressants: what do the data actually say?
A full analysis of the Williams, Barnett & Szigeti meta-analysis (JAMA Psychiatry, 2026)
This meta-analysis has been tearing through the internet like a mountain gale. Headlines screaming, people sharing, comment sections multiplying. But when I look at these discussions, what I see are mostly reactions to the abstract. “Psychedelics no better than antidepressants!” Shock, disbelief, triumph, or outrage. The holy war continues. Except the abstract is… an abstract. The real story is inside, in the methodology, in the discussion, in what the authors didn’t write explicitly. The guts are fatty. And fat is supposedly what carries the flavor, so this is going to be a filling meal! :) Apologies to my vegan friends for the comparison, but it kind of wrote itself ;-)
So let’s dig into this dish.
Remember? It started with headlines like “Psychedelics more effective than antidepressants.” “Psilocybin changes lives after a single dose.” “A revolution in psychiatry.” People shared without a second thought. Who didn’t hit share and think finally? I’ll admit, I was more than once intoxicated with excitement myself.
To be fair, those headlines didn’t come from nowhere. Early studies with psilocybin, ayahuasca, LSD looked really good. Large effects, rapid improvement, patients describing experiences that changed the way they understood themselves. For someone who works with people suffering from depression, that’s a powerful promise. For the patients themselves, a potential breakthrough. The possibility that a single deep psychedelic experience, embedded in a therapeutic relationship, could do something that months of pharmacotherapy cannot, is intellectually and clinically captivating beyond measure. Right? Hard not to get excited.
Except excitement is not data. It’s not evidence.
Last week, JAMA Psychiatry published the meta-analysis by Williams, Barnett, and Szigeti (2026). A meta-analysis is a type of study that doesn’t collect its own data but takes results from many previous studies and analyzes them together. It’s a statistical tool that lets you see a pattern where a single study shows only a fragment. That’s why meta-analyses sit high in the hierarchy of evidence (scientific data). That’s why the media and influencers pounce on them, and that’s why it’s worth reading them carefully. The strength of a meta-analysis always depends on the quality of the studies that went into it. Garbage in, elegantly packaged garbage out ;-) So how did it go this time?
This time, a paper came out that I’ll keep coming back to. I read it probably three times and couldn’t quite settle my thoughts about it. Every time I found some peace, I lost it somewhere between admiration for the statistical elegance and the uneven rhythm caused by limitations the authors treated a bit like a stepchild. I think I can safely say this analysis is a masterpiece. But like every masterpiece… it provokes questions it doesn’t answer itself.
Fair warning up front: this is not easy reading. Lots of statistics, lots of methodology, lots of math. Bayesian modeling, ROPE, scale conversions, sensitivity analyses. You have to chew through it. But once you do, you see that this is beautiful science. Yes, science is beautiful partly thanks to statistics. Maybe especially thanks to statistics.
End of introduction. Let’s get to it.
Where the “5-point advantage” came from
To understand why this meta-analysis matters, you first need to understand what it’s trying to explain. And to understand that, you need to step back and talk about how we even measure whether an antidepressant works.
In clinical trials, the standard instrument is the Hamilton Depression Rating Scale (HAM-D), a 17-item questionnaire completed by a clinician. The patient answers questions about sleep, appetite, mood, anxiety, guilt, psychomotor retardation. Each item is worth a few points, up to a total of 52. Higher score means more severe depression.
And here a concept appears that will keep coming back: the Minimal Clinically Important Difference, or MCID. This is the threshold below which a change in scores exists in the data but the patient doesn’t feel it in their life. A panel of NICE experts (Kendrick & Pilling, 2012) proposed that for the HAM-D this threshold is 3 points. Three points. To make it clear what this means in practice: imagine two people with depression. One wakes up at three in the morning and can’t fall back asleep. The other sleeps poorly but makes it through the night. On the HAM-D, that difference might be 2 points on a single item. A 3-point difference on the entire scale is roughly the boundary where a patient starts saying “I feel like something has changed.” Below that boundary, the change is statistical, not human.
And here it’s worth pausing at something that’s rarely said out loud. We’ve learned to talk about depression through the lens of scales. So many points on the HAM-D, so many on the BDI, so many on the PHQ-9. It’s useful because it gives us a common language and comparability between studies. But depression is not a score on a scale. Depression is waking up with the feeling that the day has no meaning before it’s even started. It’s the inability to enjoy the presence of your own child. It’s suffering that no questionnaire captures in full. Maybe someday we’ll arrive at a methodology that can capture it better. For now, we have what we have, and we need to know how to read it.
Back to the story. Historical data from depression trials looked like this: classical antidepressants (SSRIs, SNRIs) compared with placebo produced an improvement of about 2.4 HAM-D points. Two point four. With an MCID threshold of 3 points. This is the standard on which decades of depression pharmacotherapy have been built. Not much, but it’s what we’ve got.
Then psychedelics appeared. Studies with psilocybin, ayahuasca, LSD showed an improvement of about 7.3 HAM-D points compared to placebo. Three times as much as the 2.4 for antidepressants. On paper, it looked like a qualitative leap. Headlines wrote themselves, even before ChatGPT was born.
The difference between those two numbers, between 2.4 and 7.3, is about 5 points. Zachary Williams (a researcher at Vanderbilt University focused on metascience and clinical trial methodology), Holly Barnett, and Balazs Szigeti (a neuroscientist at Imperial College London, known for his work on the placebo effect and functional unblinding in psychedelic research) set out to explain where that 5-point difference comes from. And they found an answer that is elegant but uncomfortable.
The blinding problem, or why comparison is hard
The authors analyzed 24 studies. 8 with psychedelic-assisted therapy (249 patients total) and 16 with classical antidepressants administered open-label, without blinding (7,921 patients). And it’s worth pausing at those eight studies. Because the label “psychedelic-assisted therapy” covers very different things. Psilocybin, ayahuasca, LSD, 5-MeO-DMT. Each of these substances has a different pharmacological profile, different duration of action, different character of experience. On top of that, the therapeutic protocols differ. Different numbers of preparatory sessions, different doses, different presence and role of the therapist, different integration methods after the session. Lumping all of this together and comparing it with antidepressants is a simplification you need to keep in mind when reading the results. A meta-analysis sees the pattern but blurs the differences.
The authors’ question was elegantly simple. What happens if we equalize blinding conditions, or rather, unblinding conditions? What if in both groups the patient knows what they’re getting?
Because that’s the heart of the problem. In a standard clinical trial, neither the patient nor the researcher knows who got the drug and who got placebo. At least, it’s much, much harder to “figure it out.” Escitalopram might wreck your stomach, but the world won’t suddenly become more colorful. That inability to hold a conviction about which substance you received is “blinding.” Thanks to it, we can separate the effect of the substance from the effect of expectations, from the mere hope that “this will help me.” In psychedelic trials, this mechanism doesn’t work. After psilocybin, you often don’t need to guess which group you’re in. The experience is so distinctive that 90 to 95 percent of participants correctly identify their group. For comparison, in classical antidepressant trials that number is about 60 percent. That said, let’s note there are individuals who don’t experience the psychedelic effect even at full dose in trials, and others who experience those effects in the 1 mg group. Blinding in clinical trials is not a technical detail. It’s a question about how much of what we see in the results is an effect of pharmacology, and how much is the effect of being convinced you received something groundbreaking. This unblinding can be minimized, and there are better and worse trials in this regard, but that’s a topic for another post.
The meta-analysis authors approached this cleverly. Instead of comparing psychedelics to placebo (where blinding is hard), they compared them with antidepressants administered openly, without blinding. “They leveled the playing field.” They compared studies where in both groups the patient knew what they were taking.
The result? The difference on the 17-item Hamilton Depression Rating Scale (HAM-D) between psychedelic-assisted therapy and open-label antidepressant therapy: 0.3 points. On a 52-point scale. With p = 0.73. Statistically and clinically: nothing.
And in my opinion, good. Let the data cool down the enthusiasm. Not because psychedelics don’t work. They do, and this analysis, among other things, clearly demonstrates that. The above result is evidence that they work no worse than classical antidepressants. But enthusiasm without backing in data leads to harm. It leads to patients who abandon treatment that works in favor of treatment that promises more than it can currently prove.
Mathematical deconstruction: where do those 5 points come from?
But this paper also has its dark side, the side the headlines don’t mention. And it only hit me on the third reading!
The authors decompose this 5-point advantage into two components. And they do it in a way that is simultaneously fascinating and depressing.
Before I describe these components, an important caveat: it’s not as if the authors took a calculator and “proved” their thesis with a single equation. This is an interpretation based on comparing effect sizes from different analyses within their statistical model. The math checks out, but like any interpretation, it rests on model assumptions. The strength of this argument is that the numbers fit together too well for it to be coincidence. But it’s an argument for the coherence of the explanation, not proof in the mathematical sense.
Component 1: the expectancy effect (expectancy/blinding effect). When a patient knows they’re taking an active drug (open-label), improvement is about 1.29 HAM-D points greater than when they don’t know (blinded). This is the pure power of awareness: “I’m taking a drug that’s supposed to help me.” This effect applies to both groups, psychedelics and antidepressants. In psychedelic studies, however, it’s always present (because the patient always knows what they got), while in blinded antidepressant studies it’s partly hidden.
Component 2: the nocebo effect, or “know-cebo.” And here’s where it gets most interesting. The term “know-cebo” itself is a neologism coined by Szigeti, emphasizing that this isn’t the classical nocebo effect (worsening due to negative expectations toward the substance) but something specific: worsening resulting from the knowledge that you didn’t receive the active substance.
Placebo groups in psychedelic trials score about 4.0 HAM-D points worse than placebo groups in antidepressant trials. Not because placebo works better in antidepressant trials. Because placebo in psychedelic trials actively harms.
Imagine this. You’ve been suffering from severe depression for a long time. You sign up for a trial. For weeks you prepare yourself for an experience that’s supposed to change your life. You come to a room with dimmed lights, you’re given a pill, you put on an eye mask, they play you music. And after an hour you realize nothing is happening. You didn’t get psilocybin. You got nicotinamide.¹ The disappointment you feel in that moment is not neutral. It deepens the depression.
The sum of these two effects: 1.29 + 4.0 = 5.29 HAM-D points. Which almost perfectly matches the observed 4.9-5.0 point difference between psychedelics and antidepressants in traditional comparisons. And that’s exactly the strength of the argument I mentioned above. The fact that two independently estimated effects add up to a number that matches the observed difference almost to the decimal is not a random coincidence. It’s a strong indication that the deconstruction accurately identifies the sources of that advantage. Not proof in the logical sense, but something hard to ignore.
In other words: over 55% of the reported advantage of psychedelics over antidepressants doesn’t come from psychedelics working better. It comes from the placebo group in psychedelic trials doing dramatically worse.
It’s not psychedelics winning. It’s placebo losing. And that’s what the authors believe, writing explicitly that the observed superiority of psychedelics in blinded trials is largely an artifact of functional unblinding.
¹ Nicotinamide (nicotinic acid amide, a form of vitamin B3) is used as an active placebo in psychedelic trials because it produces noticeable somatic effects (e.g., skin flushing, feeling of warmth), which is meant to make it harder for the patient to determine definitively whether they received the active substance. In practice, this works poorly because the psychedelic experience is qualitatively incomparable to a hot flush.
Bayesian analysis: how certain can we be?
The authors didn’t stop at classical statistics. And to understand why that matters, it’s worth pausing to consider how these two approaches actually differ.
Classical statistics (so-called frequentist) answers the question: “If there truly were no difference between psychedelics and antidepressants, how likely would it be to get the results we got?” That’s the p-value. In this case p = 0.73, meaning such results would be very likely even if there were no difference. In other words: no basis for rejecting the hypothesis that the treatments work the same.
But Bayesian statistics asks a different question, a more intuitive one: “Given the data we collected, what is the probability that psychedelics are actually better?” This is a subtle but fundamental distinction. Frequentist talks about data given an assumption. Bayesian talks about the assumption given the data we have. And the latter is closer to what we actually want to know.
The authors defined the Minimal Clinically Important Difference (MCID) at 3 HAM-D points. To feel this: 3 points on the HAM-D is roughly the difference between someone who wakes up in the morning with a sense of emptiness but functions, and someone who can’t get out of bed. Or between someone who eats out of obligation and someone who starts tasting their food again. It’s not a big change on paper, but the patient feels it.
The result of the Bayesian analysis: the probability that psychedelic-assisted therapy surpasses classical antidepressants by at least those 3 HAM-D points is 0.2%. To put it simply: if we placed 500 bets that psychedelics are clinically superior, we’d win roughly one.
On the other hand: 99.1% of the analysis results fall within the so-called ROPE (Region of Practical Equivalence), the ±3 HAM-D point zone where we consider two interventions practically equivalent. ROPE is like a safety belt: if nearly the entire probability distribution (what the Bayesian analysis “sees” as possible values of the real difference) fits within this belt, it means the difference, if it exists at all, is so small the patient wouldn’t feel it.
This isn’t “we didn’t find a difference.” This is active confirmation that there is no difference. Celebrate or cry? I think both, depending on what we expected.
The time problem: 3.4 vs 8.1 weeks
And here we arrive at what I consider the most serious limitation of this meta-analysis.
But first, an explanation: in clinical trials, the “endpoint” is the moment at which researchers measure the treatment effect. It’s a predetermined time point when we ask: is the patient doing better? The primary endpoint is the most important one, the one the main conclusion of the study rests on.
The average time to primary endpoint assessment in psychedelic trials was 3.4 weeks. In antidepressant trials: 8.1 weeks. This is not a minor technical difference. It’s a fundamental asymmetry in what we’re actually measuring. It’s comparing two frames from completely different moments of the film.
SSRIs need 4-6 weeks to reach full pharmacological effect. The measurement point at 8 weeks catches them at peak efficacy. Psychedelics are measured at 3.4 weeks, often still in the phase where the effect of the psychedelic experience is only beginning to integrate.
Comparing these two endpoints is like comparing a sprinter with a marathon runner at the one-kilometer mark. The sprinter looks great, but we don’t yet know who’ll reach the finish line.
The authors mention this discrepancy in the limitations section. They acknowledge that this difference may affect the results. But they don’t model it statistically. They don’t include time to endpoint as a variable in their model. They leave it as a caveat, not as an element of the analysis.
And this matters, because methods that allow modeling of change trajectories (rather than just static measurement points from different time points) do exist. So-called IPD meta-analyses (Individual Participant Data meta-analyses) allow modeling of change curves at the individual participant level, rather than aggregated study results. They require access to raw data from each study, which is logistically difficult but methodologically possible. We don’t know whether the change trajectories in patients treated with psychedelics and antidepressants converge on the same destination, or whether we’re simply looking at different stretches of the road. This is the direction the field should move in.
The EPISODE trial: what does it say about mechanism?
The Williams, Barnett, and Szigeti meta-analysis tells us that psychedelics don’t work better than antidepressants when you level the playing field. But it doesn’t tell us what in psychedelics actually works. And that’s precisely why it’s worth looking at another study that appeared in the same issue of JAMA Psychiatry.
The EPISODE trial (Mertens et al., 2026) is a randomized clinical trial: 144 people with treatment-resistant depression, psilocybin 25 mg versus active placebo (nicotinamide 100 mg), triple-blind, two centers in Germany. Patients could receive two doses six weeks apart.
The main result? Psilocybin did not reach statistical significance on the primary endpoint, which was treatment response. In plainer terms: the proportion of patients whose depression symptoms decreased by at least 50% did not differ significantly between groups.
But there’s something else in this study, something I consider potentially more important than the main result. Patients who reported so-called emotional breakthroughs, states of deepened emotional openness, insight, a sense of inner turning point, had better treatment response. And these emotional breakthroughs were reported more frequently in the group receiving higher doses of psilocybin.
This stops me in my tracks. If the antidepressant effect correlates not so much with the dose of the substance as with the quality of the emotional experience, then we’re starting to talk about something entirely different from “a drug for depression.” We’re starting to talk about a complex intervention.
What is a complex intervention? Is it the substance, the therapy, the setting, the hope?
Psychedelic-assisted therapy is not a pill. It’s a substance plus weeks of therapeutic preparation, plus the presence of a therapist (often two) for 6-8 hours of the experience, plus integration sessions afterward. It is, as Muthukumaraswamy and colleagues (2025) argue, a complex intervention in which substance, psychotherapy, setting, and patient expectations are intertwined.
The randomized clinical trial paradigm was designed to test the isolated effect of a substance. You separate the drug from the context, give some people the drug and others placebo, and see what the drug alone does. In psychedelic-assisted therapy, this separation is structurally impossible. Not because the researchers are incompetent. Because the very object of study is inseparable.
And here I came across something worth mentioning, because few people know about it. There’s a tool called PRECIS-2 (Pragmatic-Explanatory Continuum Indicator Summary) that measures the degree to which a given clinical trial is “laboratory-like” (explanatory, aimed at explaining mechanism under controlled conditions) versus “clinical” (pragmatic, aimed at testing how treatment works in the real world). The PRECIS-2 score for psychedelic trials is around 15-16, placing them close to the explanatory pole, far from pragmatic. We’re studying psychedelics under conditions, dimmed room, music, two therapists, hours-long session, that have little to do with how treatment would look in a real-world psychiatric office. This is a curiosity, but also a problem. Perhaps PRECIS-2 deserves its own post?
What the experts say
James Rucker, a psychiatrist and senior lecturer at the Institute of Psychiatry, Psychology & Neuroscience at King’s College London, who heads the Psychoactive Trials Group and is one of the most experienced clinical psychedelic researchers in Europe, commenting on both articles for the Science Media Centre, said something worth remembering. That the results confirm a real antidepressant effect of psychedelics, but suggest that a significant part of this effect is mediated by positive expectations. And he added a clinical perspective I consider key: in practice, drugs are not given blindly, the patient always knows what they’re taking. So the open-label comparison is closer to what happens in the consulting room.
David Owens, emeritus professor of clinical psychiatry at the University of Edinburgh, emphasized that both articles touch on the central problem in evaluating psychedelics as therapeutic tools: the blinding problem. That until we solve this problem, we’ll keep going in circles with the interpretation of results.
And in the context of the EPISODE trial, other commentators noted that psilocybin, despite not reaching significance on the primary endpoint, showed an antidepressant effect on secondary measures, and that the second dose given after 6 weeks appeared to strengthen this effect. Whether this is a pharmacological effect or an effect of a second intensive therapeutic experience remains an open question.
Alternative explanations: what this meta-analysis doesn’t settle
Szigeti’s meta-analysis is elegant, but not omnipotent. Below are my questions, the ones that won’t leave me alone, and that this paper doesn’t answer.
Does equivalence on the HAM-D mean clinical equivalence? The Hamilton Scale measures symptoms: insomnia, appetite loss, psychomotor retardation. It doesn’t measure meaning, relationships, ability to work, quality of life. The escitalopram versus psilocybin study (Carhart-Harris et al.) showed that with identical HAM-D scores, the psilocybin group had better functional outcomes at 6 months. Equality on the HAM-D is not the same as equality in the consulting room.
Is episodic administration clinically better than chronic? If a single psychedelic experience produces the same HAM-D drop as months of daily SSRI pills, but without the daily side effects (emotional blunting, weight gain, sexual dysfunction), then that equivalence looks entirely different from the patient’s perspective. Statistical equivalence is not experiential equivalence.
What about the trajectory over time? The 3.4 vs 8.1 week problem is not just a technical matter. It’s a question about whether psychedelics achieve their full effect faster and whether that effect is maintained long-term. This meta-analysis doesn’t address that.
What about treatment-resistant depression? The authors ran a sensitivity analysis excluding TRD studies and the result didn’t change. But that doesn’t answer the question of whether, for a patient who doesn’t respond to their third antidepressant, psychedelics might be a valuable alternative even with identical population-level efficacy. For that patient, “not worse” might be “the only thing left.”
Where this leaves us
Before anyone says “so psychedelics don’t work,” let’s stop. Because this meta-analysis doesn’t say they don’t work. It says they don’t work better than classical antidepressants when you equalize one key factor: the patient’s knowledge of what they’re taking.
This meta-analysis is important. It’s needed. It does exactly what it should: it cools a narrative that got ahead of the data. But it doesn’t close the discussion. It opens it.
And here the more interesting conversation begins. About what actually heals… the molecule, the emotional experience, the therapeutic relationship, hope, or all of these together in proportions we don’t yet understand.
Psychedelics probably work. The question is what exactly in them works.
I’ll be talking about this at the Nauka Psychodeliczna 2026 conference (June 13-14, University of Warsaw) in my talk “Invisible Variables: What We Still Don’t Know About Psychedelic Therapy.” The emphasis will be on the clinical application perspective: what these data mean for the person sitting across from me in my office, not just for a statistical model.
If this topic interests you, sign up: https://naukapsychodeliczna.org/etn/konferencja-nauka-psychodeliczna-2026/
Curious, Dear Readers? :))
References
Kendrick, T., & Pilling, S. (2012). Common mental health disorders: Identification and pathways to care: NICE clinical guideline. British Journal of General Practice, 62(594), 47-49.
Mertens, L. J., Hinze, T., Grunder, G., Brocker, A., Jungaberle, H., Bohringer, A., Kambeitz, J., Jessen, F., & Schmidt, S. (2026). Efficacy and safety of psilocybin in treatment-resistant major depression: The EPISODE randomized clinical trial. JAMA Psychiatry. https://doi.org/10.1001/jamapsychiatry.2025.4810
Muthukumaraswamy, S. D., Baggott, M. J., Schenberg, E. E., Decker, R., & Reckweg, J. T. (2025). Psychedelic-assisted therapy as a complex intervention: Implications for clinical trial design. Therapeutic Advances in Psychopharmacology, 15. https://doi.org/10.1177/20451253251381074
Williams, Z. J., Barnett, H., & Szigeti, B. (2026). Psychedelic therapy vs antidepressants for the treatment of depression under equal unblinding conditions: A systematic review and meta-analysis. JAMA Psychiatry. https://doi.org/10.1001/jamapsychiatry.2025.4809