Comparison

AI Scribes for Psychiatry: How They Compare

The AI scribe category is crowded but uneven for psychiatry. Most products on the market are general-medical ambient scribes that clinicians have adapted to psychiatry by force of habit. A smaller group is psychiatry-specific by design. The two groups produce notes that look superficially similar and behave very differently in the corners that matter — when the note has to capture a mental status exam, justify a controlled-substance prescription, or document a real risk assessment — the moments where it matters whether something was actually in the room, watching and listening. This page lays out the field honestly. It explains how ambient scribes are built, where the general-medical pattern breaks down inside a psychiatric encounter, what the psychiatry-specific tools get right, and how to evaluate any scribe against the kind of work psychiatrists actually do.

How ambient AI scribes work (and where psychiatry breaks them)

The general-medical ambient scribe pattern is straightforward. A microphone captures the encounter. Speech-to-text produces a transcript. A language model summarizes the transcript into a SOAP note. The clinician edits, signs, and moves on. For a primary-care visit where the chief complaint is a sore throat, that pipeline works well, because most of the clinical reasoning is verbalized inside the room.

Psychiatry breaks this pipeline at three points. First, the mental status exam is not in the transcript. The MSE is what the clinician observes — affect, eye contact, psychomotor activity, thought process — and assembles silently. An ambient scribe that only summarizes what was said will produce an MSE that is generic at best and fabricated at worst. Second, a risk assessment is not a summary; it is a reasoning artifact. Documenting protective factors, dynamic risk factors, and the rationale for outpatient management requires structured clinical reasoning, not narrative compression. A scribe that flattens "the patient denied suicidal ideation" into a single sentence has not documented a risk assessment. Third, medication management — especially controlled substances — requires that the evidence behind a dose change, a switch, or an augmentation be visible at the moment of the decision. Ambient scribes produce notes; they do not produce evidence trails. When a stimulant dose increase or a benzodiazepine continuation is audited, the note has to stand on its own. Concrete failure modes follow: missing MSE elements, vague risk language that fails payer or audit review, and prescribing rationales that read as assertions rather than reasoning.

The general-medical scribe pattern

The largest and most polished tools in the category — Abridge, Suki, DAX Copilot, Nabla, Heidi, Freed — sit in this group. Their strengths are real. They have broad clinical breadth across specialties, large training corpora, mature speech recognition, and well-designed clinician interfaces. For most of medicine, they are a significant upgrade over typing into a template. Several have published time-savings data; Freed users, for instance, frequently report around thirty minutes saved per clinic day, which matches what most of these tools deliver in practice.

The gap, specifically for psychiatry, is a category-fit problem rather than a quality problem. These tools were not built around psychiatric note structure. They do not natively scaffold a full MSE, and the MSE they generate is often a summary of what the patient said rather than what the clinician observed. They do not carry mental-health-specific vocabulary at the depth psychiatry requires — distinguishing hypomanic from manic presentation, parsing affect from mood, handling the language of personality structure. They do not ground medication recommendations in the literature, and they do not produce a citation surface that a clinician or auditor can click through. They do not maintain an audit trail tuned to risk language. None of this means they are bad tools. It means that when a psychiatrist adopts a general-medical ambient scribe, the psychiatric work — the MSE, the risk assessment, the evidence behind a prescribing decision — falls back onto the clinician.

The psychiatry-specific scribe pattern

A smaller group of products targets the psychiatric niche by design. JotPsych and PMHScribe are oriented around the psychiatric encounter. Mentalyc and Twofold lean toward the therapy and behavioral-health side of mental health documentation. These products share an instinct: the note structure should match the discipline.

What this group typically gets right is the structural fit. Notes are organized around psychiatric sections — HPI with psychiatric review of systems, an actual MSE block, an assessment that supports differential reasoning, a plan that distinguishes psychotherapy from pharmacology. Vocabulary is tuned to mental health, so dysphoria, anhedonia, and constricted affect are recognized rather than generalized into "low mood." Some include early scaffolding for risk-assessment language, which is a meaningful upgrade over a generic SOAP template.

What is typically missing across this group is the evidence layer. Most psychiatry-specific scribes do not ground a medication recommendation in the literature. There is no multi-vector retrieval across the psychiatric corpus — population, intervention, outcomes, timing, evidence tier — that would let the scribe surface why a particular agent is appropriate for a particular profile. Validated clinical outcomes data in real psychiatric deployment is rare in this group; most of the validation is internal benchmarking or pilot-scale, not chart-level diagnostic accuracy in a working practice. The category does the structural work well. The evidence work is still open.

A comparison framework (six dimensions)

The six dimensions below are the ones psychiatrists tell us actually predict whether a scribe will survive a year in practice. The table is a snapshot of the field as it stands today; vendor capabilities move, and Sigmund's own column distinguishes what he does in deployment today from what is on the near roadmap.

AI scribes for psychiatry, evaluated on six dimensions.
Dimension	General-medical ambient scribes	Psychiatry-specific scribes	Sigmund
Captures the MSE	PartialSummarizes what the patient said; observer-side MSE is sparse.	YesPsych-structured MSE block, varying depth by product.	YesFull observer-side MSE, editable by the clinician before signing.
Risk-language structure	NoRisk language collapsed into narrative.	PartialSome scaffolding; consistency varies.	YesStructured risk frame: static, dynamic, protective, disposition rationale.
Evidence-grounded med recommendations	No	No	YesMulti-vector retrieval across the psychiatric literature; ranked by evidence tier.
Citation trail in the note	No	No	YesPMIDs and guideline references inline; clickable from the chart.
Validated in a real psychiatry practice	NoTime-savings data; no published diagnostic accuracy in psychiatry.	NoPilot and internal benchmarks; published psychiatric outcomes data limited.	YesN = 124 charts at Integrative Psychiatry Manhattan; 87.4% diagnostic accuracy. NIH R21 + R01 prospective studies underway.
Controlled-substance documentation support	PartialCaptures the discussion; rationale is left to the clinician.	PartialTemplate-aware; evidence rationale absent.	YesIndication, dose, monitoring plan, and evidence basis surfaced together in the note.

Where Sigmund fits

Sigmund is the psychiatry-specific scribe who sits in on the session with an evidence engine behind him. He drafts the note the way psychiatrists work — HPI, an observer-side MSE, a real assessment, a plan that distinguishes psychotherapy from pharmacology, a structured risk frame when the encounter requires it. The difference from other psychiatry-specific tools is that every clinical recommendation in the plan is grounded in the literature, with citations he renders inline. He ranks treatment options through a multi-vector engine that weighs the patient profile against the published evidence across population, intervention, outcomes, timing, and evidence tier — so a stimulant choice for an eight-year-old with comorbid anxiety surfaces a different ranking than the same chief complaint in an adult.

Generation 1 has been validated at Integrative Psychiatry Manhattan: N = 124 charts, 87.4% diagnostic accuracy across five condition categories, 100% clinician endorsement of improved work quality, effect sizes r = 0.81–0.88 versus generic AI baselines. NIH R21 (PAR-25-310) and R01 (PAR-25-283) prospective studies are underway across outpatient settings. The full validation picture, including study design and limitations, lives on the evidence-based AI page; the product itself is described on the Sigmund home page.

How to evaluate one in your practice

You do not need a procurement committee to figure out whether a scribe will hold up in psychiatry. Run any candidate through a thirty-minute audit using the checklist below. If a product fails on more than one item, it is the wrong category fit for psychiatric work — not a bad tool, just the wrong tool.

Does it capture a full observer-side MSE, or only what the patient said?
Does it cite the evidence for any medication recommendation it surfaces, with a clickable source trail?
Would the controlled-substance documentation survive a DEA or payer audit on its own, without the clinician filling in the rationale by hand?
Would the risk-assessment language survive a malpractice review — static factors, dynamic factors, protective factors, disposition rationale?
Is there published validation data in a real psychiatric setting — chart-level accuracy, not just time-savings — that you can read?

The detailed audit and format expectations for each of these checks — the conventions Sigmund holds the record against — live on the documentation page, alongside the structural decisions that drive Sigmund's note format on the decisions page.

Sigmund is built inside the Sultan Lab for Mental Health Informatics at Columbia University Irving Medical Center and the New York State Psychiatric Institute, under the direction of Dr. Ryan Sultan. The lab's work on guideline-concordant prescribing is the empirical foundation underneath the scribe.

Sigmund is investigational and intended to assist — not replace — clinical judgment. Competitor capabilities described on this page reflect public product information at time of writing and may change.