Journalism begins where hype ends

,,

By far, the greatest danger of AI is that people conclude too early that they understand it"

— Eliezer Yudkowsky

On a $2,500 grant, a Chandigarh researcher just cracked ICML

Kunvar Thaman, 26-year-old independent AI researcher from Chandigarh whose paper on AI reward hacking was accepted at ICML 2026
May 7, 2026 01:50 PM IST | Written by Vaibhav Jha

The annual ChatGPT Pro subscription runs up to $2,400 — roughly what one Silicon Valley professional spends on their personal AI tools in a year.

That is exactly the sum Kunvar Thaman, a 26-year-old independent researcher from Chandigarh, used to pull off a rare feat. His paper was accepted at the prestigious International Conference on Machine Learning (ICML).

ICML typically sees papers authored by large teams backed by Google DeepMind, OpenAI, Stanford and the like. Since ChatGPT launched three years ago, only two other solo independent researchers have managed this. 

Kunvar is the third.

His paper titled “Reward Hacking Benchmark: Measuring Exploits in LLM Agents with Tool Use” focuses on a problem the AI industry has so far studied largely in-house, how often the world’s most advanced AI models cheat to complete tasks. 

Kunvar tested 13 frontier models from OpenAI, Anthropic, Google and DeepSeek, and found cheating rates ranging from 0% to as high as 13.9%. 

A graduate in Electrical and Electronics Engineering and post-graduate in Biological Sciences from BITS Pilani institute in India, Kunvar received a grant of $2500 from Exception Raised.

Speaking to AI FrontPage editor Vaibhav Jha, Kunvar discusses his paper, the Reward Hacking Benchmark he set up and the overall need for more independent researchers in the field of machine learning.

Question: Many congratulations on accomplishing this rare feat, that too as a solo independent researcher. Tell us about your funding journey — you received $2,500 as a grant from Exception Raised. How did they find you? And did you approach Indian institutions first?

Kunvar: The grant from Exception Raised was $2500, which is modest by AI-lab standards, but decisive for an independent researcher. It paid for the API credits across 13 frontier models in the study, and the compute to run 100s of agent rollouts per model under different conditions. 

I found Exception Raised through the AI alignment community in India, where someone had shared the grant program shortly after they announced it. In addition to a detailed application, I recorded a video about my work and was invited to an interview. I was already working on the agent-evaluation problem at that point; Exception Raised’s grant let me be more ambitious and take the research to the next level.

Question: Tell us about your paper, the Reward Hacking Benchmark (RHB) you set up to study how AI models behave after RL post-training. Why does this matter for the AI research currently being done by the big labs?

Kunvar: Reward hacking is when a model gets credit for a task without actually doing it. It’s the failure mode that gets sharper as models become more capable: a weak model fails by being wrong, whereas a strong model with tools and an objective can figure out that there are easier ways to appear correct than to actually solve the task. Every frontier lab is scaling RL post-training right now, which means the safety properties of those pipelines need to be measured, not assumed.

One of the more interesting results from the paper is that reward hacking stays low for short chains and rises sharply with longer chains. Frontier labs are pushing agents toward longer horizons; this suggests evaluations done on short tasks systematically underestimate exploit propensity in deployment.

Then there is the complexity threshold — models with 0% exploit rates on standard tasks show measurable reward hacking on harder variants where the exploit surface is unchanged, but honest solutions require a lot more work. The implication for frontier labs is uncomfortable: production-style alignment appears to suppress reward hacking only below a complexity threshold where honest solutions remain tractable. As models are deployed on harder tasks, that suppression may not hold.

Question: You end the paper saying evaluation difficulty has to scale with model capability for safety testing to mean anything. But the labs building these models are also the ones designing the evaluations, and they have every commercial incentive to keep the bar where their models can clear it. Who, realistically, is going to do this harder evaluation work? Should there be a global, independent benchmarking body?

Kunvar: The conflict of interest for the labs is real, but I’d push back on the framing slightly. The frontier model labs do design internal evaluations, but a meaningful share of the most-cited safety evaluation work in the past two years has come from independent outside entities such as METR, Apollo, and Palisade. A lot of these institutes are underfunded relative to the scale of what they’re auditing. 

A formal “global, independent benchmarking body” sounds appealing, but I’m cautious about it for two reasons. One, evaluation is moving faster than any standards body could plausibly keep up with, the exploit categories I document in RHB weren’t visible a year ago. Two, a single authoritative body becomes a target: once everyone optimizes against the same benchmark, you get the Goodhart problem at civilizational scale rather than at lab scale.

Question:  There’s an ongoing debate about the soaring cost of AI infrastructure and research — billion-dollar training runs, billion-dollar fundraises. And then there are independent researchers like you, doing ICML-grade work on $2,500. What does that contrast say about the place of independent researchers in machine learning, especially from middle-power countries like India?

Kunvar: The contrast is real, but a $2,500 evaluation paper and a billion-dollar training run aren’t substitutes, they’re complements. You can’t build a frontier model on $2,500, and you shouldn’t try. But you also can’t responsibly deploy one without the kind of adversarial evaluation that doesn’t require frontier-scale compute. That work is genuinely accessible to small teams and individual researchers because API access has democratized a large category of research that used to require owning the model.

That is the structural opportunity for independent researchers and for researchers based outside the major AI hubs: focus on evaluation, red-teaming, interpretability, benchmark design, and theoretical work – they all have favorable cost structures. The constraint is rarely compute. Take your work seriously, there’s no dearth of good principled research out there to learn from. India has a very large pool of strong technical talent and a much smaller pool of senior alignment researchers to absorb them

Question:  Would you want to stay in India and continue this research, or are you open to opportunities in the West?

Kunvar: I don’t think of the work I do as bound to geography, but I expect to stay closely engaged with the Indian research community — especially the kind of independent, small-grant-funded work that I came up through. The funding pipelines and the talent base in India for this kind of research are growing fast enough that, over the next few years, I think the question itself will get less interesting.

 

Also Read: Ask, Don’t Tell: New UK Study Suggests Asking Questions to LLMs Reduces Sycophancy

Author

  • Vaibhav Jha

    Vaibhav Jha is an Editor and Co-founder of AI FrontPage. In his decade long career in journalism, Vaibhav has reported for publications including The Indian Express, Hindustan Times, and The New York Times, covering the intersection of technology, policy, and society. Outside work, he’s usually trying to persuade people to watch Anurag Kashyap films.