Clinical decisions

Clinical Decisions Where Evidence-Based AI Outperforms a Generic LLM

There are roughly five clinical moments per psychiatry visit where a citation matters more than a fluent sentence. A child with ADHD and anxiety, deciding what to start. An adolescent two years into a stimulant trial, deciding whether to keep going. A 34-year-old in a first depressive episode, picking the SSRI you'll defend at follow-up. A 62-year-old on clonazepam since 2014, asking what taper actually looks like. A nine-year-old already on risperidone for irritability, with parents who deserve a real answer about long-term safety. This page is about those moments — what guideline bodies actually say, where a generic large language model drifts, and what Sigmund produces instead: sitting in, weighing the evidence, and showing his work. The work is by Dr. Ryan Sultan and the Sultan Lab for Mental Health Informatics at Columbia University Irving Medical Center.

Decision one

ADHD medication selection in a child with anxiety

Comorbid anxiety is the rule, not the exception. About a third of children with ADHD meet criteria for an anxiety disorder, and the order of operations in this profile is not the same as in uncomplicated ADHD. AACAP guidance places evidence-based behavioral parent training first when anxiety is moderate or higher, then a non-stimulant trial — typically atomoxetine — before a methylphenidate or amphetamine challenge. AAP guidance for school-age children supports behavior therapy and medication together, with the explicit note that anxiety, tics, and irritability require treatment sequencing decisions the clinician documents. The ranking exists because methylphenidate can worsen anxiety in a meaningful minority of children, atomoxetine has dual-target effects across attention and anxious arousal, and behavioral parent training carries durable effect into adolescence in ways stimulants by themselves do not.

A generic LLM will recommend stimulants by default in this case. The reason is mechanical: PubMed abstracts and the older clinical literature overrepresent "stimulants are first line for ADHD," and a model trained on internet text averages toward that consensus without conditioning on the comorbidity. The patient profile gets flattened.

Sigmund, running the CEBA-ADHD engine, hears this profile differently. Behavioral parent training first. Atomoxetine elevated above methylphenidate. The ranking arrives with the studies behind it — the AACAP practice parameter for the specific subsection, the Pelham sequencing RCT, the atomoxetine-in-comorbid-anxiety meta-analysis. The clinician sees the chain and overrides freely. The note that gets signed reflects that decision in the clinician's voice, not the model's average.

Decision two

Stimulant titration and the cumulative-dose question

The titration question every clinician runs into is not "what dose today" but "what total exposure over years." Pediatric and adolescent psychiatry needs a stimulant-years construct, analogous to pack-years for tobacco. A child started on 5 mg of methylphenidate at age seven and stepped up to 36 mg by age fourteen has accumulated a measurable exposure that interacts with substance use risk, motor-vehicle outcomes, ED utilization, and cardiometabolic monitoring needs. The dose-response curves are not flat. They bend.

Randomized trials cannot answer the cumulative-exposure question. Phase III stimulant trials run for weeks to months. The outcomes that matter — substance use disorder, motor vehicle crashes, emergency department visits, conviction rates — emerge over years. That evidence comes from observational datasets: MarketScan-style commercial claims, Medicaid analytic files, Scandinavian registry linkages. The 2025 JAMA Psychiatry analysis from the Sultan Lab showed ADHD medication exposure reduced criminal convictions, substance-related ED visits, and motor vehicle crashes by 30–42% across a five-year follow-up window in a U.S. commercial-claims cohort.

That work, and others like it from the New York State Psychiatric Institute, gives the clinician a defensible threshold to titrate against. A generic LLM cannot reproduce it — the data is not in its training corpus, and the inference requires longitudinal claims-linkage work that does not live in PubMed abstracts. A citation-grounded system can.

Decision three

Antidepressant selection in a treatment-naive adult

STAR*D shaped a generation of antidepressant practice and also taught the field that "any SSRI" was a defensible opening move only because the trial designs of the early 2000s could not distinguish between them at the population level. The clinical reality at the chair is different. The 34-year-old in front of you has sleep that is fragmented, a partner who matters, a deskbound job with chronic low back pain, and a sister who responded to escitalopram. The opening SSRI in that profile is not a coin flip.

Sequential trial logic is what STAR*D actually demonstrated: response is achievable, but it often requires two to three sequenced attempts, and the second agent benefits from being chosen against the failure profile of the first. APA practice guidance is permissive at the front end and specific at the switch point. The clinician's job is to set up trial one so that trial two, if needed, has the most informative starting point.

Where a generic LLM averages — "an SSRI such as sertraline or escitalopram is first-line" — an evidence-grounded engine tiers. Sleep complaint with morning fatigue tilts toward mirtazapine or paroxetine in some profiles, away from fluoxetine in others. Sexual side effect concern tilts toward bupropion-augmentation logic at the front, not the back. Comorbid chronic pain raises duloxetine. A family member's response is a real signal in pharmacogenetic-naive practice and deserves to be weighted. The ranked output, with the citations, becomes the substrate the clinician documents and defends.

For the 34-year-old above, Sigmund ranks the opening SSRI against the profile — sleep, comorbid pain, family response — and attaches the chain:

Escitalopram · rank #1 for this profile

PMID 29477251 Comparative efficacy and acceptability of 21 antidepressants for major depressive disorder: a network meta-analysis Network meta-analysis · 522 RCTs · Lancet 2018

PMID 17074942 Acute and longer-term outcomes in depressed outpatients requiring one or several treatment steps (STAR*D) Sequenced-treatment trial · N=4,041 · Am J Psychiatry 2006

APA MDD Practice Guideline for the Treatment of Patients With Major Depressive Disorder Guideline · American Psychiatric Association

Decision four

Benzodiazepine deprescribing

The taper line is the highest-risk sentence in the chart for a meaningful slice of outpatient psychiatry. "Switch to a longer half-life agent, then taper by 5–10% every one to two weeks" reads as one clean instruction. It is not. The conversion math from alprazolam or lorazepam to diazepam or clonazepam is approximate. Patients dose-stack, hold doses for sleep, and rebound. The 5–10% step is of the current dose, not the original dose, which changes the absolute milligram drop every two weeks and changes what the patient experiences. Long-term users develop interdose withdrawal that mimics the original anxiety, and the differential between relapse and withdrawal is not always available at the chair.

The audit surface is high. CDC and FDA messaging on benzodiazepine prescribing has tightened since 2020. Malpractice review boards look for documented rationale, documented patient-facing education, documented contingency plans for withdrawal symptoms, and a defensible pace. Every one of those is a citation hook — a place where a clinician benefits from the relevant guideline language and recent comparative-effectiveness data being surfaced as the note is written, not searched for after-hours.

This is where citation-grounded support most directly reduces drift. A generic LLM produces fluent taper plans that sound correct and often are not specific. An engine that retrieves against the patient profile — duration of use, agent, comorbid SUD, age, hepatic function — and weighs the evidence tier surfaces a defensible plan with the references the clinician will document.

For the long-term benzodiazepine patient, Sigmund ranks the benzo-sparing path for the underlying anxiety and surfaces the taper evidence — chain attached:

SSRI/SNRI first-line, structured taper · rank #1 for this profile

PMID 30712879 Pharmacological treatments for generalised anxiety disorder: a systematic review and network meta-analysis Network meta-analysis · Lancet 2019

FDA 2020 Boxed-warning update: benzodiazepine dependence, withdrawal, and taper guidance Regulatory · U.S. FDA

CANMAT 2014 Clinical practice guidelines for the management of anxiety disorders — SSRIs/SNRIs first-line, benzodiazepines time-limited Guideline

Decision five

Pediatric psychotropic prescribing safety

The pediatric evidence base is thinner than the adult literature, the off-label rate is higher, and the public scrutiny is greater. None of that is an argument against careful prescribing. It is an argument for explicit citation. AACAP and AAP have moved in step on most stimulant questions and parted on others, the FDA carries its own black-box landscape for SSRIs and antipsychotics in youth, and the empirical literature has shifted faster than the package inserts on second-generation antipsychotic monitoring in children.

Antipsychotic prescribing in children is the most-watched area in pediatric psychopharmacology. The 2019 JAMA Network Open work from the Columbia Department of Psychiatry on antipsychotic prescribing trends in U.S. youth has been cited several hundred times and is taught in residency programs. It described both growth in prescribing and the persistent gap between guideline-recommended metabolic monitoring and what shows up in the chart. A generic LLM will produce a reasonable-sounding monitoring schedule without knowing whether it matches the AACAP recommendation, the FDA labeling, or the most recent comparative-effectiveness review.

Off-label prescribing in pediatrics is normal and frequently appropriate. Documentation of the rationale is what separates defensible practice from indefensible practice. Sigmund's role here is to surface, in the moment, the guideline subsection and the strongest underlying studies, so the clinician's note carries the rationale on its face. Practice at Integrative Psychiatry Manhattan has built around exactly that workflow.

In Sigmund

How Sigmund handles these moments

Sigmund reads every indexed paper across five dimensions — population, intervention, outcomes, timing, and overview — and matches the patient profile against each one rather than collapsing into a single fuzzy topic score. Evidence weighting tiers RCT over meta-analysis over cohort over case series, with recency and effect-size adjustments. The output is a ranked list of treatment options for the profile, traceable to PMIDs and guideline subsections. The clinician sees the chain, overrides freely, signs the note. The architecture lives in detail on the evidence-based AI page; the product surface lives on the home page; how the citation chain renders inside the chart lives in documentation; and the head-to-head against generic AI scribes lives on compare.

Sigmund is investigational and is intended to assist — not replace — clinical judgment. Patient data handling reviewed by the Sultan Lab for Mental Health Informatics at Columbia University Irving Medical Center.