Back to New Tab
2026 To Mark A Turning Point For AI Agent Observability As Enterprises Scale Production
Enterprise Security
As AI agents enter production in 2026, Doneyli De Jesus, Solutions Architect at ClickHouse, shows why observability and audit trails now define trust, scale, and control.

The first step is collecting the traces and the activity the agents are doing. If you can’t reproduce the output or understand how it was produced, you can’t improve it or trust it.
2025 was the year enterprises experimented with AI agents. 2026 is when they put them to work. As these systems move from pilots into real operations, a hard problem is surfacing fast: no one can fully explain what the agents are doing, why they produced a specific output, or whether that result can be reproduced. It's an issue not of capability, but of verifiability. Without a clear audit trail, organizations are deploying digital labor they can’t inspect, trust, or govern at production scale.
Doneyli De Jesus is a Solutions Architect at ClickHouse, where he works with enterprises to turn AI and data strategy into production-ready systems. With more than two decades of experience across roles at Snowflake and Elastic, he has spent his career translating executive goals into technical execution. A TEDx speaker and the Montreal Chair of the AI Circle, De Jesus is a familiar voice on AI infrastructure, known for focusing on what happens when experimental systems collide with real operational demands.
"The first step is collecting the traces and the activity the agents are doing. If you can’t reproduce the output or understand how it was produced, you can’t improve it or trust it," says De Jesus. To solve the black-box problem, organizations must build a reproducible audit trail for their digital labor, he explains.
The reliability hurdle: But building that audit trail is made difficult by the technology's inherently non-deterministic nature. "The biggest hurdle is how you turn something that is inherently non-deterministic and probabilistic into something that you can actually rely on time and time again," notes De Jesus.
A framework for reliability: To make these systems reliable, De Jesus points to a practical architectural approach that contains the LLM’s probabilistic behavior rather than trying to eliminate it. Instead of allowing models to act freely, organizations can route critical actions through deterministic programs and controlled functions that behave the same way every time. "You give the LLM access to tools, which are little programs or specific functions that will give you a deterministic output every time," he explains. "But then you have to make sure that the LLM has access to those tools, knows when to use them, and knows how to use them effectively."
Show your work: But he cautions that tools are only half the battle. Success demands a parallel "organizational change," coupling the right tech stack with the right internal metrics. "The agent needs to present the traces. These are the steps it took and its reasoning for a specific output. That could involve multiple LLM calls and all the technical steps that ultimately lead to the output it's giving you."
With a reliable data trail in place, the problem shifts from technical collection to management. The approach reframes the issue as a classic industrial problem: applying manufacturing-grade process controls to digital labor, especially as agents take on demanding operational roles in areas like manufacturing and contract management.
Beating the human benchmark: To justify its implementation, an agent must meet two clear benchmarks. First, "it has to achieve a success rate that is equal to or higher than what a human does. Otherwise, it's not worth implementing," says De Jesus. Second, it must perform the work at a much greater scale. "Instead of reviewing 10 contracts a day, maybe it can review a hundred. Those are the two scales: the error rate and the scale at which the agent can actually succeed."
Setting the ground rules: But giving agents this much autonomy creates an urgent need for a clear governance framework to manage their actions and permissions. The risk of agents "running amok" is precisely why formal guidelines, such as the NIST AI Risk Management Framework, are emerging to help organizations make sure that an agent's autonomous "shadow actions" are subject to effective oversight. "You have to define controls, rules, guidelines, and guardrails," De Jesus continues. "As they scale this out into production, it's becoming more important to figure out what these things are actually doing, how you govern them, and how you track them.”
For any leader wondering when to act, De Jesus offers a candid assessment of the market: it's still in its infancy. That reality presents early-adopting leaders with a rare chance to help define industry standards themselves. "It's still a very nascent space. I don't think companies are even at the stage that they can start observing anything because they don't have anything in production yet. I think mid-2026 is when we're going to start seeing an uptick in this."
While 2025 was about identifying high-value use cases for AI agents, success in 2026 and beyond will be defined by something more fundamental. For enterprises looking to scale from promising pilots to enterprise-wide impact, agent observability is becoming a primary gating factor.

