Testing psychedelics: What’s the control group?

Here’s a joint post between Jess Hartnett (author of Psychological Statistics for Everyone and the Not Awful and Boring blog) and me!

Once used only for recreational or spiritual reasons, psychedelics like ayahuasca and psilocybin are emerging as a treatment for people coping with prolonged grief, terminal illness, treatment-resistant depression, substance use disorder, PTSD, eating disorders, and anxiety. While illegal in the United States, psychiatric researchers are taking their potential seriously and scientists are conducting research trials on the efficacy of these drugs.

In this post, Jess and I provide questions that allow research methods and statistics students to practice what they are learning, in the context of this research question.

Researchers are increasingly studying whether psilocybin mushrooms like these are an effective treatment for depression, PTSD, and other mental illnesses. It’s difficult to find an appropriate placebo control group for such studies. Credit: YARphotographer/Shutterstock

However, there is a big problem with studying psychedelics using typical medical research procedures. The gold standard for medical research is a double-blind, randomized, controlled clinical trial (RCT). In an RCT, one group of participants receives the active treatment and another group receives a sugar pill (placebo). In an RCT, both groups (and, often, the researchers analyzing the data) remain unaware of which group they are in. Despite double blinding, research ethics dictate that all participants are made aware of the possibility that they will receive hallucinogens during the study.

The problem is, double-blind research is nearly impossible with psychedelics. Why? The psychedelic treatment experience is unusually obvious. The experimental participants know which condition they are in as soon as the hallucinogens kick in. Early research using psychedelics traditionally used niacin, a B vitamin, as the placebo, since it can make skin tingle and feel flushed. However, this doesn’t seem to trick the control group, who knowingly signed up for a study on psychedelics. They soon figure out they are in the control condition.

This problem was summarized by Kristen French in Nautilus:

90 and 95 percent of people can guess whether they’ve gotten a psychedelic in a blinded trial.
In contrast, in comparable studies of antidepressants, only about 60% of people correctly guess their condition.

In other words, despite the importance of having a double-blind study, it’s virtually impossible to keep people blind to the condition in a test of psychedelics. Psychedelic studies are always “open label” (a term for when a patient knows what drug they are receiving in a study).

For Research Methods Students

Solution 1 : Dosing

Researchers have come up with two solutions to this problem. One is that, rather than simply comparing “psychedelic” to “placebo” conditions, they manipulate dosages. For example, scientists might randomly assign people to low, medium, and high dosages of these substances, and look for degrees of improvement (here’s an example in the journal JAMA).

The dosing strategy is consistent with the discussion of control vs. comparison groups in Research Methods, Chapter 10 (Simple Experiments). There, you learn that while an experiment always needs a comparison group, the comparison group does not necessarily need to be a control group (if we define a control group as one that receives no treatment). For example, an education researcher might compare “math teaching method A” to “math teaching method B.” There’s no control group–there’s not a group that gets “no teaching”. But it’s still an experiment. Similarly, a mental health researcher might randomly assign participants with depression to either a new psychotherapy technique or to “treatment as usual,” which is whatever therapy they are currently using.

1a. Consider the ethics of this design. In general, why might having a comparison group, rather than a no-treatment control group, be considered more ethical in a study of a serious illness such as depression?

1b. Now imagine a study that compared different dosages of psychedelic medication, such as 5mg vs 25mg. In your own words, what are the advantages of this design? In your view, what might be a potential weakness of this design?

Solution 2: Open labels for all!

There’s a second solution to our inability to create truly masked treatment groups in psychedelic research. Here’s how it is described in the Nautilus article:

…a team of scientists recently devised a workaround. Instead of comparing psychedelics against a placebo, they compared them against so-called open label antidepressants, a trial where everyone knows who’s getting what.

Why is this design effective? Essentially, the psychedelic group is already open-label, since everyone knows they are in it–you can’t hide the psychedelic experience as we already noted. As a result, any improvement in this experimental group’s level of depression will be attributable to either the psychedelic or the placebo effect (or a combination of both). Similarly, in an open-label antidepressant experimental group, any improvement is also attributable to either the drug or the placebo (or a combination of both).

The Nautilus journalist reported on a meta-analysis of 24 such studies that did just that. Here’s the conclusion:

Across a review of 24 studies, they found that psychedelics were no more effective than open-label antidepressants. They published their results in JAMA Psychiatry.

1c. Remind yourself: What is a meta-analysis?

Next, if you have access through your university library, look at the forest plot (Figure 2) in the meta-analysis paper.

1d. In this forest plot, the yellow curves represent the open-label antidepressant studies and the blue curves represent the psychedelic studies. Why are the blue curves wider? (Hint: it has something to do with the sample size, or N.)

1e. Each curve represents a difference between people’s depression score before treatment and their score at the end of treatment. Fill in the blanks:
A negative score therefore represents _______, which means that all of the curves in this meta-analysis showed that the treatments ___________ [improved/worsened/had no effect on] levels of depression.

1f. The blue vertical line represents the average of the psychedelic studies and the yellow line represents the average of the antidepressant studies. Look at the individual studies and averages on Figure 2. Make some observations about the different studies.

The meta-analysis paper had two main findings. In the quote below, PAT represents Psychedelic Assisted Treatment and TAD represents Traditional Anti-Depressants).

… PAT (8 trials; 249 patients) was no more effective than open-label TAD treatment (16 open-label TAD trials; 7921 patients), with an estimated difference of 0.3 favoring open-label TADs (95% CI [−1.39,1.98]; p = .73). Open-label TADs were associated with better outcomes than blinded treatment(144 blinded TAD trials; 31792 patients), with an estimated difference of 1.3 (95% CI [0.07, 2.51]; p = .04;), but the same difference not observed for observed for PAT(0.67;95% CI [−3.08, 1.73]; p = .58). [Note: quoted statistical copy was edited to meet APA style.]

1g. Read the results sentences above. Then, sketch two bar graphs of two major findings. The y-axis on each graph should represent “decrease in depression.” One graph should have two bars, to show the finding comparing the efficacy of TAD and PAT. Another graph should have four bars, to compare the efficacy of Open TAD to blinded TAD and the efficacy of Open PAT and blinded PAT.

For Statistics Students

Research about hallucinogens becomes increasingly complicated, because researchers aren’t just comparing a placebo/control group to a halluinogen/experimental group. They need tools that let them simultaneously compare multiple experimental conditions.

Another concern, especially in studies that compare antidepressants to hallucinogens, is the fact that antidepressants can take several weeks to start having the desired effect. Measuring a dependent variable (such as a score on a depression inventory) only once will probably not provide a full picture of the impact of treatment.

Let’s walk through how the statistical tools used to analyze this data have changed over time to keep up with changing research methodologies.

Initially, researchers would compare the experimental hallucinogen to a control group on some outcome variable, like a depression inventory.

2a. Which statistical test does this mimic? Hint: Think back on one of the most basic tests taught in Introduction to Statistics, in which two groups are compared and evaluated on a single outcome variable.

Good on you if you remembered the independent t test. However, as mentioned, some research looks at multiple doses of a single hallucinogen. That’s more than two groups.

2b. Which test would work well to compare one dependent variable among multiple groups, like a “control” group receiving an antidepressant vs. a microdose of a hallucinigen vs. a larger dose of a hallucinigen?

The correct answer here is the one-way ANOVA. However, neither of these tests allows us to study how people react and feel across time. After all, the hope of this research is to provide people living with mental illness long-term relief from their symptoms. How would such a study be different? For instance:

2c. Assuming that a t test and ANOVA would use different participants in different conditions (between-group research design), what sort of research design would be required if we studied people who receive different doses but are measured at multiple points in time?

2d. What is the slightly more complicated version of ANOVA that could study such a question?

Factorial ANOVA, using a mixed design, would allow us to study the question of both a) different doses/control groups AND b) relief from mental illness over time.

To review your understanding of statistical choices made in hallucinogen research, consider this JAMA publication by Yngwe et al. (2026).

2e. What are the two conditions of the study?

2f. What is the dependent variable measured at the end of the study?

2g. At how many points in time were participants asked to complete the DV?

2h. How is this both a within and between-group research design that could be studied with a Factorial ANOVA?

Ideally, you could interpret that the two conditions in this study compared psilocybin to niacin, one of the classic placebos used in early research in this field. This research studied major depressive disorder, so it makes sense that the score on the Montgomery-Asberg Depression Rating scale was used. That score was collected from all research participants at five points in time. This study has between-group elements, as it has two conditions (psilocybin and niacin), and within-group elements, as each participant provided data at five time points (depression scores at Time 1, Time 2, Time 3, Time 4, and Time 5).

Everyday Research Methods

Testing psychedelics: What’s the control group?

For Research Methods Students

For Statistics Students

Like this:

Related Posts

Leave a ReplyCancel reply

For Research Methods Students

For Statistics Students

Like this:

Related Posts

Leave a ReplyCancel reply

Discover more from Everyday Research Methods