For a brief window between roughly 2022 and 2024, a particular narrative dominated the discussion of AI in mental health. The argument ran like this. Therapy is expensive, scarce, and unevenly distributed. Large language models are now fluent enough to hold extended supportive conversations. Therefore, autonomous AI chatbots will, over the next few years, deliver the bulk of low-intensity mental health support, with human therapists displaced upward into more complex cases or out of the workflow entirely.
That narrative has not survived contact with the evidence. The companies pursuing it most aggressively have either pivoted or wound down their consumer offerings. The regulators have converged, with unusual consistency across jurisdictions, on a different model. And the structural reasons why autonomous AI struggles in mental health are now reasonably clear — they are not reasons that will be fixed by larger models.
What happened to Wysa and Woebot
Wysa and Woebot were, for several years, the two most prominent consumer-facing AI mental health chatbots. Both raised substantial venture funding. Both attracted serious clinical advisory boards. Both pursued formal evaluation in healthcare systems rather than relying on direct-to-consumer wellness positioning.
Wysa undertook a pilot within NHS Talking Therapies, the largest publicly-funded psychological therapy service in the United Kingdom and arguably the most useful possible testbed for autonomous AI in routine mental health care. The 2026 evaluation of that pilot reported modest effects on mild-to-moderate depression but did not demonstrate non-inferiority to therapist-delivered low-intensity intervention. In a service whose throughput is constrained by therapist capacity, a chatbot that produces modestly smaller effects than a step-2 PWP does not deliver the workforce relief that the deployment case requires. The pilot's findings did not justify autonomous deployment at scale.
Woebot's trajectory was different but instructive. Its consumer chatbot ceased operations in 2025. The company had built a credible clinical evidence base for specific anxiety and depression applications, but the consumer product proved difficult to sustain commercially without payer reimbursement, and payer reimbursement for fully autonomous AI mental health interventions did not materialise on the expected timeline.
These are not isolated failures. They are consistent signals about a category. The category of fully autonomous AI mental health intervention has not delivered the outcomes that would justify the deployment posture its proponents anticipated. This is not a verdict about whether AI can be useful in mental health. It is a verdict about a specific deployment model.
Why autonomous AI struggles here — structural reasons
The reasons are not about model capability. The current generation of language models can produce fluent, empathic, appropriately-worded supportive responses across an enormous range of presentations. The fluency is not in question. What is in question is whether fluency, on its own, does the work that effective therapy does. The evidence suggests it does not, for four structural reasons.
Formulation does not survive in the model. Effective CBT is built on a coherent case formulation — a working theory of how this particular client's difficulties developed, what maintains them now, and what the therapy is therefore trying to change. The formulation is what gives the therapy direction across sessions. It is what makes the third session's behavioural experiment a logical extension of the first session's psychoeducation rather than a free-standing intervention. Current language models hold conversational context within a session, sometimes across sessions through retrieval mechanisms, but they do not construct or maintain a clinical formulation in the sense that a therapist does. The case structure is absent. What looks like a conversation about the client's difficulties is, mechanically, a sequence of locally-coherent responses that lack the integrating clinical theory.
Safeguarding sits exactly where AI is weakest. The cases that require escalation — disclosure of risk, deteriorating presentation, contraindication for a particular intervention — are exactly the cases where the cost of a wrong call is highest. Autonomous AI handling these cases is being asked to make decisions whose consequences extend beyond the conversation, on the basis of information that is incomplete, on a population where the base rates of serious risk are non-trivial. Even with conservative escalation thresholds, the system either over-escalates (degrading utility) or under-escalates (incurring harm). Therapist-in-the-loop is not a regulatory concession. It is a recognition that this decision class is poorly suited to autonomous handling.
Drift runs in the opposite direction with no fidelity instrument. The therapist drift literature documents the well-known tendency of qualified clinicians to gradually move away from evidence-based protocols. That literature presupposes a human practitioner who can, in principle, be observed, supervised, and brought back into alignment with the protocol. Autonomous AI does not have a supervisor in this sense. It drifts toward whatever its training and reward signals push it toward, which may or may not be a coherent therapeutic approach. There is no CTS-R for a chatbot, no equivalent of a fidelity audit. The system can produce protocol-incoherent output for thousands of sessions before anyone notices, and when noticing happens, it is because of an external complaint rather than internal quality assurance.
Outcome signal is structurally weak. Routine outcome measurement is already difficult in human-delivered therapy; autonomous AI inherits all of those difficulties and adds new ones. The clients least likely to complete repeated outcome measures are the clients whose engagement is most fragile. Self-report from a conversational AI's users is biased by the same factors that bias self-report generally, plus the absence of a clinician asking the questions in a structured way. The result is that autonomous tools rarely get good signal on whether they are actually helping the clients they are designed to serve.
None of these are technology problems waiting for a larger model. They are problems about what mental health work is and what infrastructure it requires.
The regulators have noticed
The regulator's position is now coherent across jurisdictions in a way it was not three years ago. The FDA in the United States has progressively tightened its stance on autonomous mental health AI as software-as-a-medical-device, particularly where claims of therapeutic effect are made. The MHRA in the United Kingdom has aligned with similar device-classification logic and has been notably cautious about cleared deployment without clinician oversight. The American Psychological Association's guidance documents, which carry weight beyond the United States, converge on the same principle.
The line that has emerged, across regulators with quite different histories and incentives, is consistent: AI in mental health belongs in the workflow as an adjunct to a clinician, not as a replacement. The therapist-in-the-loop model is the consensus position. The autonomous-deployment model is, in the regulator's view, not where the evidence supports the field to be operating.
This convergence is not bureaucratic caution. It reflects what the evaluation evidence has actually shown, and it shapes the deployment environment that any AI mental health product now operates in.
What therapist-directed AI does differently
The alternative to autonomous deployment is not "less AI." It is AI that operates under a clinician's direction, within a case the clinician is formulating, with safeguarding decisions retained by the clinician, and with outcomes measured in the clinician's existing routine.
The architectural shift this implies is significant. In an autonomous model, the AI holds the relationship with the client and the clinician (if present) is a backstop. In a therapist-directed model, the clinician holds the relationship and the case structure; the AI is an instrument the clinician deploys for specific tasks — between-session work, homework prompting, structured psychoeducation, capture of self-monitoring data, structured pre-session preparation. The AI does what AI is genuinely good at (consistent, available, fluent prompting and information capture across a defined task) while the clinician does what clinicians are genuinely required for (formulation, judgment, safeguarding, the therapeutic relationship).
This is not a hedge. It is a different product category. The instruments needed to make it work — the integration with the clinician's workflow, the structured handoff of homework from session to between-session AI, the surfacing of between-session data into the next session's review — are real engineering problems, but they are problems with a known structure rather than problems whose underlying feasibility is in question.
It also fits the evidence about what is actually missing in routine practice. The bottleneck in CBT outcomes, as the Kazantzis homework meta-analysis keeps pointing out, is rarely the in-session work. It is the between-session work and its integration with the next session. That is precisely the layer of practice where consistent, available, structured AI prompting genuinely adds capacity, without displacing the clinician's role.
The commercial position is also stronger
There is a secondary observation worth making, because it tends to be left implicit. Therapist-directed AI is not only the position the evidence supports. It is also the commercially stronger position.
Payers prefer it because clinician oversight provides the accountability and audit trail that reimbursement processes are built around. Regulators prefer it because the decision rights are located where the regulatory frameworks expect them. Patients prefer it — repeatedly, in survey data — because they trust a human clinician to be in the loop on the decisions that matter. Clinicians prefer it because their professional role is augmented rather than threatened, and because the AI handles the parts of practice that are genuinely tedious without claiming the parts that constitute their craft.
The autonomous-AI narrative ran on the assumption that the human clinician was a cost centre to be displaced. The evidence and the market have both pushed back. The clinician is a load-bearing element of mental health care, and the useful question is what tools they have to work with — not whether they can be replaced.
Supervisia Companion is the therapist-directed model in practice.
Companion doesn't hold the case. You hold the case. Companion handles the between-session layer — the AI presence your client can engage with, the homework structure you set being prompted at the right moments, the data captured and surfaced back to you before the next session's review. Your formulation, your safeguarding decisions, your judgment all stay where they belong. The AI does what AI is genuinely good at and stops where the clinician's role begins.
References
- Waller, G. & Turner, H. (2016). Therapist drift redux: Why well-meaning clinicians fail to deliver evidence-based therapy, and how to get back on track. Behaviour Research and Therapy, 77, 129–137. DOI: 10.1016/j.brat.2016.01.007. PubMed: 26752326.
- Walfish, S., McAlister, B., O'Donnell, P. & Lambert, M. J. (2012). An investigation of self-assessment bias in mental health providers. Psychological Reports, 110(2), 639–644. DOI: 10.2466/02.07.17.PR0.110.2.639-644.
- Kazantzis, N., Whittington, C., Zelencich, L., Kyrios, M., Norton, P. J. & Hofmann, S. G. (2016). Quantity and quality of homework compliance: A meta-analysis of relations with outcome in cognitive behavior therapy. Behavior Therapy, 47(5), 755–772. DOI: 10.1016/j.beth.2016.05.002. PubMed: 27816086.
- NHS Talking Therapies Wysa pilot evaluation (2026). Publicly reported evaluation of an autonomous AI mental health intervention deployed within NHS Talking Therapies services.
- Woebot Health (2025). Publicly reported wind-down of the consumer Woebot chatbot.
- U.S. Food and Drug Administration; Medicines and Healthcare products Regulatory Agency (UK); American Psychological Association. Guidance documents on AI in mental health care, treated here as policy positions rather than primary research.
Last updated: May 2026
