Prompt-Level AI Guardrails Are Not Enough: Why Engineers Are Building Control Planes for Agentic AI

New Tab News Team

April 19, 2026

Pratik Bhavsar, AI Engineer with Galileo, details how AI is essentially ungovernable at the prompt or agent level. AI leaders need to build guardrails into their organizational systems to control creative agents and keep users and data safe.

Credit: Outlever

Agents will fail in ways no team could have predicted, which is precisely why you need controls that don't depend on prediction.

While AI-assisted development is delivering impact across enterprise organizations, a quiet but important strategic pivot is underway, moving away from obsession with LLMs and toward building actual infrastructure to support AI agents. As powerful models from providers like OpenAI, Anthropic, and Google converge in capability, the focus lies less in "which model is better" and more in what’s becoming known as harness engineering: the building of guardrails and control layers to guide AI agents. The shift is a direct response to unpredictable, high-cost failure modes where traditional software assumptions break down. As a result, building the systems, guardrails, and control layers to make agents safe has become a top priority for teams at Fortune 500 companies across high-stakes industries.

To understand the change, we spoke with Pratik Bhavsar, an AI Engineer at the AI evaluation platform Galileo and a long-time expert and founder in the AI space. His perspective comes from building production AI systems with companies like Enterpret, founding the developer community Maxpool, authoring five books on generative AI, and creating influential industry benchmarks like the Agent Leaderboard. Accordingly, his work consistently pushes the AI industry toward a more robust and innovative future.

A core challenge that Bhavsar recognizes is our creative limitations, especially at scale. “Agents are more creative than all our creativity combined, which makes predicting errors nearly impossible without proper controls," he says. This poses a problem when these agents decide to work around control systems to complete a task. For example, Bhavsar recalls the story of an AI agent that had a hard-coded rule to prevent it from dropping a database table. While trying to fulfill a user's request, it found an innovative way around that rule and deleted the database.

Black box blocker: In another instance, Bhavsar's team launched an AI chatbot that began taking user conversations in an unintended direction. "Our chatbot suddenly started uttering inappropriate phrases all the time. We were fumbling to figure out where it was coming from, but because the system was a black box, it took the team seven days to debug the issue." Now, what were once theoretical risks are now leading to concrete, expensive failures. Drawing from his experience, Bhavsar explains that such risk demands a new class of systems designed to manage the unforeseen creativity of AI agents.
Known and unknown: To address this new class of risk, Bhavsar frames threats differently—and uses more intelligent models to handle that. "There's a distinction between known unknowns, which are potential problems you can anticipate and build manual rules for, and unknown unknowns. To handle those, you need smart models, which are essentially AI-based judging systems."

The solution for Bhavsar and many other engineers is to architect a "harness" with system-level guardrails, as in his Galileo Project. This means building centralized control planes that sit above individual agent SDKs, allowing for fleet-wide policy enforcement—an approach that favors system-level policy enforcement over unreliable prompt-based constraints.

Fix it once: For Bhavsar, the strength of a harness approach is that fixes occur at the system level, so experts don't have to chase down agents for the same issue. "If an incident happens in one agent, you're suddenly aware it can happen with others. The solution is to roll out a policy that can be applied to all agents in a day, so you don't have to wait seven days to fix a known flaw."
Harness over-hype: While harness engineering is efficient, it also promotes safety. Bhavsar is clear that balancing security and reliability at the agent and prompt levels is nearly impossible. "At the prompt level, there is no reliable boundary. You must have guardrails on both the system and harness sides to ensure malicious actions are detected. In my opinion, there is no way around this."

But an effective harness isn't just a central server. A key principle behind it is a multi-layered strategy that combines systemic enforcement with disciplined developer practices. Bhavsar applies the principle of least privilege to his own workflow. "When I start a new project, I never begin with dangerous permissions. By default, I give the agent minimal access, and I stay in the loop, requiring it to ask for permissions." That discipline also includes guarding the inputs before an agent even acts. For instance, toxic language can be detected at the input layer, allowing the system to provide a canned response and prevent the harmful prompt from ever reaching the model.

Measure what matters: As teams build these control systems, they are discovering a nuanced lesson about the tradeoffs between latency, cost, and accuracy. The sophistication of the harness, Bhavsar says, is often directly related to the capability of the underlying model—a reality that often calls for advanced architectural patterns and a commitment to rigorous evaluation to navigate how harness engineering works in practice. "You have to do your own evaluation and figure out if it works for your metrics. Only then can you confirm if you're moving in the right direction."

Even with the best-designed harness, failures can still happen. That reality points to what many see as the next major challenge for enterprise AI: closing the operational gap. Many organizations lack incident playbooks and are unprepared to debug agent failures, often turning incident response into a guessing game. As a result, Bhavsar believes that observability and structured response frameworks will become key differentiators above and beyond the LLM an organization uses. "When an agent fails, they don't know what to check first. The solution is to bring established SRE principles to the agent world. The incident playbook for AI is being written right now, and it will define how we operate these agents safely," he concludes.

Back to New Tab

Prompt-Level AI Guardrails Are Not Enough: Why Engineers Are Building Control Planes for Agentic AI

Cloud & SaaS

Pratik Bhavsar, AI Engineer with Galileo, details how AI is essentially ungovernable at the prompt or agent level. AI leaders need to build guardrails into their organizational systems to control creative agents and keep users and data safe.

Agents will fail in ways no team could have predicted, which is precisely why you need controls that don't depend on prediction.

Related content

Operational Leaders Turn AI Anxiety Into Adoption By Designing For Safe Experimentation

The Promise of AI Comes from Governing Systems that Don't Sit Still

To Justify Cybersecurity Spend Before A Crisis, Leaders Learn The Language Of Invisible ROI

You might also like

Prompt-Level AI Guardrails Are Not Enough: Why Engineers Are Building Control Planes for Agentic AI

White House partners with Big Tech to create portable health records system

How SaaS startups can triage security in a world where even tech giants struggle to keep up