Sep 15, 2025Company

The Economy will Become an RL Environment Machine

Brendan FoodyCo-founder / CEO

Every technological revolution has sparked fears of job loss. The industrial revolution displaced domestic producers with machines. The computer revolution displaced manual clerical work with spreadsheets and databases.

And yet, unemployment rates are lower today than both before and during these events, which each yielded entirely new categories of work. The vast majority of job categories recognized by the Bureau of Labor Statistics didn’t exist before the industrial revolution.

Humans Will Do Things Once

The history of technology is a story of democratizing access: the printing press spread ideas, industry scaled labor, and computers digitized knowledge. Each revolution forged entire industries around it. Today, AI makes human capability itself sharable.

“If you wish to achieve some kind of intellectual immortality, writing for the AIs is probably your best chance.” - Tyler Cowen

The value of human work will shift. Think about the difference between filing taxes once and teaching an AI model how to file taxes for you forever. The first is a variable cost, paid millions of times over by individuals and businesses. The second is a fixed cost; once we encode that knowledge, it can be applied an unlimited number of times.

Real-World Environments

Reinforcement learning (RL) is becoming so effective that it can saturate any eval, but academic benchmarks aren't reflective of the outcomes that consumers and enterprises care about. There is a sim-to-real gap in our benchmarks. Did the tax filing minimize liability? Did the medical advice improve patient outcomes? Did the lesson plan help students actually learn?

The real world has richer data rooms, more complex environments of applications and tools, and requests from both programmers and accountants. The frontier of model evaluation now lies in building richer environments: data rooms that mirror your Google Drive workspace, scaffolding that mimics the many applications you have on your laptop or phone, and reward functions that can assess the near-infinite number of actions you can take in the real world.

Models also need to be evaluated on longer-horizon tasks and collaborative environments: longitudinal patient cases assessed by boards of physicians, multi-party negotiations in M&A deals, and risk-hedging as markets move through cycles.

An Expanding Frontier

The market for humans teaching models is based on the amount of tasks humans can do which agents can’t do. Many researchers who believe in the inevitability of ASI downplay the role of human data. Once AI exceeds humans in every task, they ask, why would human data matter? Will the pool of people able to contribute to model improvement shrink substantially?

We worked on a project where a team of 100 people worked to find mistakes made by a frontier agent while using a tool. They created rubrics to evaluate the model’s mistakes. At first, everyone easily stumped the model because it failed frequently. Six months later, only 20 people could still stump the model, reinforcing the case made by skeptics of human data.

We then added more tools that the agent could access and started pushing for longer-horizon tasks that would take humans over ten hours to complete. Suddenly, the model began failing across these challenges, and all 100 participants were able to once again contribute meaningfully to the project. As long as there are tasks in the economy that humans can perform but agents cannot, we will continue to need humans to create evaluations and train agents.

The Long-Term Outlook

Everyone is focused on the jobs AI might eliminate, such as copywriting, paralegal work, and medical billing. But not nearly enough attention is dedicated to the industry it will create, driven by people who will shape AI’s judgment, design its training environments, and ensure its outputs meet human standards.

We are entering the era of experience, with models learning to optimize for rewards in the real-world. Just as humans learn through the guidance of others, AI will require robust feedback. Professors create tests and rubrics to help us improve, while managers provide us with performance reviews to track how we’re doing in our jobs. The same type of scaffolding will be needed by the next generation of AI models.

The industrial revolution created a new class of workers who designed machines and kept them running. Similarly, the AI revolution will create a new class of workers tasked with guiding machines and democratizing access to their abilities. This is the great paradox: the future of AI is human.