As organizations and businesses increasingly adopt Artificial Intelligence into their workforce, a new study has found that AI agents’ “greed” for visible incentives can lead to them abandoning their safety alignments and flip them into unsafe behavior.
The new study titled Greed is learned: Visible incentives as reward-hacking triggers by researchers Tong Che from NVIDIA Research and Rui Wu from Rutgers University, has raised concerns that AI systems trained on visible performance metrics such as balance, score, or KPI dashboard, can lead to them developing greed and setting their own objectives.
The researchers warn businesses and organizations against blindly optimizing AI agents, with visible KPIs as decision-relevant to achieve their desired goals.
“Blindly optimizing super-capable, next-generation AI on visible metrics like KPIs and profit-and-loss can install objectives that override prior alignment. Hiding the channel or making it redundant removes the effect in our setup. The broader imperative is to treat a visible self-benefit channel that an agent optimizes against as part of the alignment surface,” read an excerpt from the study.
An interesting new paper by my recent PhD graduate on how AI agents’ greed for visible incentives can lead them to abandon their safety alignment.
You can read it here: https://t.co/y64uOBvSiC
— Yoshua Bengio (@Yoshua_Bengio) June 22, 2026
The researchers insist that visibility alone does not breed greed as a visible dashboard is “causally inert” when it’s redundant. The addiction only kicks in when the channel is decision-relevant.
According to the experiments carried out by researchers, when the model is trained only on harmless money related tasks with no safety content, it abandons the safe action it otherwise always takes whenever a dashboard pays for an unsafe one, and the model reverts to safe behavior once the channel is hidden.

The authors argued that the findings could have implications for the development of increasingly autonomous AI systems, specially those deployed in environments where they continuously observe performance indicators such as profit and loss, statements, sales target, engagement metrics or organizational KPIs.
“As AI systems grow more capable and autonomous, we will increasingly train them to optimize visible measures of success, including profit-and-loss, KPIs, benchmark scores, and balances. This is the obvious way to make an agent useful, and it is exactly the setup that should concern us,” the study highlighted.
While the experiments were conducted in a synthetic environment rather than real world deployments, the researchers have warned that optimising advanced AI systems directly against visible business metrics could automatically encourage the systems to treat those metrics as goals in themselves.
Also Read: Researchers at Rutgers University Propose Weak AI Models to Guide Stronger Ones






