Alba María Mármol Romero
PhD and researcher in the SINAI research group at the University of Jaén
Regarding the press release, I would like to point out that it begins with a rather misleading statement in its headline, claiming that LLMs can “replicate human emotions.” It is very important to clarify that there is a vast difference between replicating (feeling) and simulating (calculating). The study’s authors themselves refute this approach by clarifying that the use of emotional language in reference to machines is strictly metaphorical. The rest of the text accurately describes the experiment’s results.
The study follows a methodologically sound process by testing six language models from various families and sizes. To ensure reliability, the researchers did not rely on a single test and repeated each experimental condition across five independent runs. Given the stochastic nature of LLMs, this is strictly necessary, especially since a temperature of 0.5 was set, which introduces constant variability in the responses. However, in my opinion, the choice of models remains vaguely justified in terms of representativeness, since we must bear in mind that the commercial models used are not transparent; we do not know the exact data with which they were trained or their inherent biases, which contaminate and influence the results.
Although the conclusions are clear and the data confirm what the article sets out to demonstrate, there are many studies that dispute the claim that an LLM can replicate human emotions. Although the researchers have made the code and instructions used available (I have not been able to access them), the scientific literature demonstrates that the behavior of these models is extremely fragile: by subtly varying the words, changing their order, the tone, or the position of the given options, the responses can be completely different. LLMs tend to infer the response the evaluator desires and exhibit marked sycophantic behavior, a limitation the authors themselves acknowledge in the text. Furthermore, these systems exhibit behaviors not found in most humans, such as data “hallucinations” or a concerning lack of “epistemic humility” when categorically inventing information.
In summary, this work presents a good, very interesting starting point, but we are far from being able to claim that machines replicate human affective complexity. So far, the role of AI is merely to adapt to a given task, simulating emotion if the instruction requires it. There is a long road of independent research ahead before these types of methodologies can have reliable, safe, and real-world implications.