This article is 6 months old

A study shows the potential of large language models to detect signs of depression and suicide in patients

Large language models—artificial intelligence systems based on deep learning—could be useful in detecting mental health risks such as depression and suicide risk in narrative tests of patients undergoing psychiatric treatment. This is one of the conclusions of research published in JAMA Network Open, which also shows the potential of embeddings — a natural language processing technique that converts human language into mathematical vectors — to achieve the same end.

23/05/2025 - 17:00 CEST
Expert reactions

Alberto Ortiz Lobo - LLM depresión EN

Alberto Ortiz Lobo

Doctor of Medicine and Psychiatrist at the Carlos III Day Hospital - La Paz University Hospital (Madrid)

Science Media Centre Spain

The study aims to measure the ability of artificial intelligence (AI) language models to detect depression and suicide risk. The data analysed comes from the incomplete sentence test, a semi-projective test in which people have to finish sentences that are presented to them, providing subjective information about their self-concept, family, gender perception and interpersonal relationships, for example. The study is being conducted on patients who are already undergoing psychiatric treatment, so it is not possible to generalise its results to apply this methodology to risk detection in the general population at this time.

The assessment of mental health problems lacks objective measures, laboratory data or imaging tests. The possible application of AI in mental health will, in any case, have to focus on people's subjective narratives, as is done in this research. However, it is one thing to detect risks and carry out screening, and quite another to treat people with mental suffering, a task that goes beyond applying a technological solution and in which the subjectivity of the professional is essential for developing the therapeutic bond.

The author has not responded to our request to declare conflicts of interest
EN

Anmella - LLM (EN)

Gerard Anmella

Psychiatrist and researcher at the Depressive and Bipolar Disorders Unit of the Hospital Clínic de Barcelona

Science Media Centre Spain

This is a study conducted in South Korea in which the responses of 1,064 people (aged 18-39) to a sentence completion test (called SCT) were analyzed. This test asks about four categories: self-concept (e.g., “I am...” and the person completes the sentence), family (“Compared to other families, mine is...”), gender perception (“My wife (or husband) is...”), and interpersonal relationships (“The people I work with are usually...”). 

LLMs (large language models) such as GPT-3.5 were used to analyze the responses and detect two aspects: 

  • Whether there were depressive symptoms (yes or no). 
  • If there was a risk of suicide (yes or no). 

The results were good: the LLMs were able to correctly identify (more than 70% of the time) those with depression or suicide risk. 

These results were especially strong for texts related to self-concept (e.g., “I am...”, “Someday I will be...”, “To be really happy I need...”), indicating that these fragments are especially relevant to detect aspects linked to depression (more than other questions: context is important). 

In addition, the reasons why the models gave one or the other answer were analyzed, with interesting findings: it was observed that some people masked or minimized the symptomatology in their answers, which caused the LLM to fail to detect self-reported depression. This is similar to what can happen in a consultation: if a patient does not open up and is not honest, even for a professional, it is difficult to identify what is going on. 

In summary, this is an important study because it explores the use of a novel technology whose applicability in mental health is still unknown in its entirety, but which will probably play a key role in the coming years, since: 

  • Mental health assessments are fundamentally language-based. 
  • LLMs seem to be able to analyze subtle aspects of language, similar to what happens in a clinical interview. 
  • LLMs are increasingly trained with more data (they know more), so they can operate with greater precision. This was demonstrated in the study, as the results did not vary when the model was given cues (examples of depression) versus when it was not. 
  • The performance of LLMs is constantly improving, generating more elaborate answers. In addition, in newer models it is possible to access the chain of reasoning (the process behind the response) and ask why the model responds in one way or another. Although we may not know in a fully transparent way the inner workings of these models, we can thus infer why they give the answers they do.”  

How does it fit with the existing evidence and what new developments does it bring?  

“For about 15 years, natural language processing techniques have been applied in mental health to try to find linguistic markers to detect psychological problems and help professionals identify and treat them. 

This study follows that line, applying new language analysis technology, which provides us with different and complementary data. It is another step on the road to personalized, democratized mental health, based on objective markers. It is hoped that these tools, in the future, can complement the work of professionals and help people better understand their own mental health problems.” 

Are there important limitations to be aware of? 

“Like any study, it has limitations that the authors acknowledge. First, it is based on self-reports (people declare whether they have depressive symptoms or not). This can be complicated to assess because someone may confuse an expected suffering in the face of a stressor (such as a breakup) with a depressive syndrome. They are not the same: the former is a natural reaction, while the latter is a potentially treatable pathology. Therefore, it is always preferable to have this corroborated by a mental health professional. 

Another fundamental aspect is the confidentiality of the data: before uploading them to the LLM cloud, the authors anonymize the information to protect the privacy of the patients. This step is key and opens the door for future studies to use similar methodologies. 

Finally, and very importantly, there is the clinical applicability: that a professional can base a decision on the recommendation of an LLM, or that an automatic suicide risk alert is activated thanks to a model of this type, is still far from reality. On the one hand, better results, validated in different countries, cultures and population types, are needed. On the other hand, careful thought must be given to the risks and benefits: who will be responsible for a clinical decision based on an LLM —the model, the practitioner, both—?

 

Gerard Anmella has received fees related to continuing medical education or consulting fees from Abartis Pharma, Adamed, Angelini, Casen Recordati, Johnson & Johnson, Lundbeck, Lundbeck/Otsuka, Rovi, and Viatris, with no financial or other relationship relevant to the subject of this article. 

EN
Publications
Journal
JAMA Network Open
Publication date
Authors

Silvia Kyungjin Lho et al.

Study types:
  • Research article
  • Peer reviewed
  • People
The 5Ws +1
Publish it
FAQ
Contact