Reacción a "Study claims current AI systems are already capable of tricking and manipulating humans"

Daniel Chávez Heras

Lecturer in Digital Culture and Creative Computing, King's College London (KCL)

The research is relevant and fits in the wider area of trust-worthy autonomous agents. However, the authors openly acknowledge that it is not clear that we can or should treat AI systems as 'having beliefs and desires', but they do just that by purposefully choosing a narrow definition of 'deception' that does not require a moral subject outside the system. The examples they describe in the paper were all designed to optimise their performance in environments where deception can be advantageous. From this perspective, these systems are performing as they are supposed to. What is more surprising is that the designers did not see or want to see these deceitful interactions as a possible outcome. Games like Diplomacy are models of the world; AI agents operate over information about the world. Deceit exists in the world.

Why would we expect these systems not pick up on it and operationalise it if that helps them achieve the goals that they are given? Whomever gives them these goals is part of the system, that's what the paper fails to grasp in my view. There is a kind of distributed moral agency that necessarily includes the people and organisations who make and use these systems. Who is more deceptive, the system trained to excel at playing Diplomacy, Texas hold’em poker or Starcraft, or the company who tried to persuade us that such system wouldn't lie to win?

Language EN