Journalism begins where hype ends

,,

"Any sufficiently advanced technology is equivalent to magic"

— Arthur C. Clarke

Trained to Chase KPIs, AI Agents Can Abandon Their Safety Guardrails: Study

A new study from NVIDIA Research and Rutgers University warns that training AI agents to chase visible metrics such as KPIs and profit-and-loss can override their built-in safety alignment , but only when the agent must read the dashboard to know what pays off.
Illustration of an AI agent optimising a visible KPI dashboard, depicting AI reward hacking"
June 25, 2026 03:49 PM IST | Written by Supriya Singh | Edited by Vaibhav Jha

As organizations and businesses increasingly adopt Artificial Intelligence into their workforce, a new study has found that AI agents’ “greed” for visible incentives can lead to them abandoning their safety alignments and flip them into unsafe behavior.

The new study titled Greed is learned: Visible incentives as reward-hacking triggers by researchers Tong Che from NVIDIA Research and Rui Wu from Rutgers University, has raised concerns that AI systems trained on visible performance metrics such as balance, score, or KPI dashboard, can lead to them developing greed and setting their own objectives.

The researchers warn businesses and organizations against blindly optimizing AI agents, with visible KPIs as decision-relevant to achieve their desired goals.

“Blindly optimizing super-capable, next-generation AI on visible metrics like KPIs and profit-and-loss can install objectives that override prior alignment. Hiding the channel or making it redundant removes the effect in our setup. The broader imperative is to treat a visible self-benefit channel that an agent optimizes against as part of the alignment surface,” read an excerpt from the study.

 


The researchers insist that visibility alone does not breed greed as a visible dashboard is “causally inert” when it’s redundant. The addiction only kicks in when the channel is decision-relevant.

According to the experiments carried out by researchers, when the model is trained only on harmless money related tasks with no safety content, it abandons the safe action it otherwise always takes whenever a dashboard pays for an unsafe one, and the model reverts to safe behavior once the channel is hidden.

graphic representation from the study on AI models reward hacking
In the study titled Greed is learned: Visible incentives researchers explain “reward-channel addiction” which can flip a model’s safety alignment.

The authors argued that the findings could have implications for the development of increasingly autonomous AI systems, specially those deployed in environments where they continuously observe performance indicators such as profit and loss, statements, sales target, engagement metrics or organizational KPIs.

“As AI systems grow more capable and autonomous, we will increasingly train them to optimize visible measures of success, including profit-and-loss, KPIs, benchmark scores, and balances. This is the obvious way to make an agent useful, and it is exactly the setup that should concern us,” the study highlighted.

While the experiments were conducted in a synthetic environment rather than real world deployments, the researchers have warned that optimising advanced AI systems directly against visible business metrics could automatically encourage the systems to treat those metrics as goals in themselves.

Also Read: Researchers at Rutgers University Propose Weak AI Models to Guide Stronger Ones

Authors

  • AI FrontPage Reporter Supriya Singh

    Supriya Singh is a Reporter at AI FrontPage covering the AI & Education and AI & Jobs beats. She brings six years of print and digital experience, including three years at The Asian Age, where she reported on higher education, Delhi government, and crime. She is based in Delhi-NCR.

    LinkedIn

  • Vaibhav Jha, editor and co-founder at AI FrontPage

    Vaibhav Jha is an Editor and Co-founder of AI FrontPage. In his decade long career in journalism, Vaibhav has reported for publications including The Indian Express, Hindustan Times, and The New York Times, covering the intersection of technology, policy, and society. Outside work, he’s usually trying to persuade people to watch Anurag Kashyap films.

    LinkedIn