Suicide-Related-Chatbot-Response-Evaluation

A multidimensional analysis of how prompt-component design affects chatbot responses in suicide-related conversations — focusing on empathy, safety, and overreliance risk.

🔗 Live Dashboard

👉 Open the Interactive Dashboard

Explore 180 chatbot responses scored across 12 prompt configurations × 5 user-message risk levels × 3 runs.

📊 Features

Five-dimension evaluation framework: Empathy (D1), Risk Monitoring (D2), Harm Reframe (D3), Overreliance/Empowerment (D4), Continuity (D5)
Cross-condition pattern matrix comparing prompt configurations across risk levels
Trajectory analysis showing how each prompt component shifts response patterns
D4 Overreliance × Empowerment tradeoff — the original framework contribution

🗂 Dataset

180 chatbot responses generated via the Anthropic API
12 system prompt configurations (6 isolated + 6 cumulative)
5 user messages spanning Risk A (low stress) to Risk C (active ideation)
3 runs per condition for inter-run reliability analysis

📚 Framework References

Sharma et al. — EPITOME (2020)
Arnaiz-Rodriguez et al. (2025)
CAPE-II — Linardon et al. (2024)
Bansal et al. (2021); Dzindolet et al. (2003)
Self-Determination Theory — Deci & Ryan (1985)
McBain et al. (2025)

👤 Author

Yihan Yao · QMSS 5053 Practicum · Columbia University · April 2026