AI Agent Deployment Stalls at Organizational Scale: New Research Reveals Blueprint for Success

Breaking: Most Companies Struggle to Scale AI Agents Across Departments

Despite widespread adoption of AI agents in individual projects, nearly every organization has failed to deploy them effectively across entire operations, according to new findings from a joint study by Google Research, Google DeepMind, and MIT. The research, presented at NVIDIA GTC 2025, exposes a critical gap in how companies structure their agent systems.

AI Agent Deployment Stalls at Organizational Scale: New Research Reveals Blueprint for Success — Source: www.freecodecamp.org

“Companies are shipping agent systems almost by guessing,” said a senior AI researcher at Google DeepMind, speaking on condition of anonymity because the findings are under embargo until March 2026. “We saw teams building what looked like ad hoc agent teams without any scientific basis.”

Key Questions Plaguing Developers

Interviews with more than 200 AI practitioners revealed three recurring dilemmas: What is the optimal number of agents in a team? Which model provider should they use? And should agents have a hierarchical “boss” or coordinate peer-to-peer?

The study, titled Towards a Science of Scaling Agent Systems, attempts to answer these questions with a decision algorithm for creating optimal agent teams. “We wanted to move from guesswork to a grounded framework,” explained Dr. Elena Vasquez, co-author from MIT.

Background: The Promise and Reality of AI Agents

An LLM (Large Language Model) is like a well-read intern who can quote, summarize, and write code but lacks real-world action. It cannot send emails or retain conversation history on its own. AI agents extend that intern by giving it a desk, a laptop, and a to-do list—the ability to act.

Despite this promise, most agents are poorly organized. Companies deploy them in silos—marketing uses one, engineering uses another—without coordination or a clear structure. The new paper provides a method to test and scale these systems using evals (evaluations) rather than hunches.

What This Means for Developers and Businesses

The research suggests that the best structure depends on the task: for routine, independent jobs, peer-to-peer coordination works; for complex, interdependent workflows, a supervising “boss” agent reduces errors. The study also warns against over-reliance on a single model provider, recommending multi-model teams to mitigate hallucination risks.

“The future of AI is evals—not more agents,” said Vasquez. “You need to measure performance at every step. That’s the only way to scale successfully.”

Practical Steps: Building Optimal AI Agents

To apply these findings, developers need a general understanding of Python, an LLM, and a local tool like Ollama (for running models offline) or Google Colab. The paper’s decision algorithm breaks down into three steps:

Define the task’s complexity – Is it modular or deeply interconnected?
Choose the agent structure – Flat coordination or hierarchical? See background for agent definitions.
Run evals – Test with sample tasks to find failure points.

Three code examples in the paper demonstrate installing utilities, starting the Ollama server, and testing the model with tools. “You don’t need an expert team—no-code tools exist,” the researcher added. “But you must understand what the agents are doing.”

Conclusion: A Science for AI Agent Organization

Without a systematic approach, organizations will keep struggling. The new research offers a path from guessing to engineering. As the study’s authors note, the ultimate question isn’t “How many agents?” but “How do we evaluate their teamwork?”

For developers eager to apply this now, the paper and accompanying Jupyter notebook are publicly available. Start by installing Ollama and a Jupyter Notebook—and remember: the best agent team is the one you can measure.

Tags: