More open and autonomous observability: the new standard in business

  • OpenTelemetry consolidates a common telemetry language that frees from vendor lock-in and facilitates the integration of AI into observability.
  • Observability ceases to be merely operational and now connects with business metrics, user experience, and real economic impact.
  • Agent Observability drives AI agents that detect, analyze, and remedy problems with increasing autonomy, supported by reliable data.
  • Security, governance, and Zero Trust become essential to controlling the expansion of agentic AI and autonomous systems in critical environments.

more open and autonomous observability

La Observability has gone from being a niche technical topic to a strategic pillar For any organization that relies on software—which is practically all of them—simply “monitoring servers” or looking at isolated dashboards is no longer enough. Companies need to understand what’s happening within their systems in real time, connect that data to the business, and react quickly when something goes wrong. And, to top it all off, they must do so in an increasingly software-driven environment. Agent AI, open standards, and distributed architectures.

In this scenario, the trend is clearly towards a more open observability, more closely linked to business results and much more autonomousOpenTelemetry is becoming established as the common language for telemetry, AI is moving beyond experimentation to become integrated into the core of observability platforms, and ITops teams are transforming into orchestrators of intelligent systems that detect, analyze, and even correct problems on their own. Let's break down how this change is happening and what implications it has for technology, business, security, and data governance.

From classical monitoring to the era of observability

The evolution from the traditional monitoring towards modern observability It goes way back. When pioneering APM tools emerged, such as those popularized by Lew Cirne with New Relic, the big news was being able to see in detail what the code of a monolithic application was doing in a company-owned data center. That was a revolution: for the first time, teams could observe the performance of their production applications with very fine granularity.

With the advent of cloud computing, microservices, containers, serverless computing, and DevOps and SRE practicesThe landscape changed completely. The shift from monolithic to distributed systems meant that point-in-time visibility was no longer sufficient. A service is no longer a single application, but a swarm of ephemeral microservices, orchestrated on platforms like Kubernetes, deployed dozens of times a day, and running on hybrid infrastructures with multiple cloud providers.

In that environment, traditional monitoring, focused on predefined metrics and static alerts, falls short. Observability introduces a different approach: collecting and correlating metrics, logs, traces, and events to deduce the internal state of the system from its external outputs. It's not just about knowing that something has failed, but about understanding why it happened and what impact it has on the user and the business.

Authors like Yuri Shkuro This difference is well summarized: monitoring measures what has been decided beforehand as important, while observability allows you to formulate new questions about the system without having prepared all the indicators in advance. In other words, Observability turns telemetry data into actionable context for development, operations and business.

This transition is also driven by very specific factors: a brutal pressure to innovate fastIncreasingly demanding customers who abandon an app at the slightest flaw, an almost infinite range of technologies and managed services, and a growing automation of the entire software lifecycleAll that automation is also software that can fail, and it needs its own observability.

Complexity, risk, and too many tools: why observability is critical

observability trends

Modern architecture imposes four major headaches that make the observability is practically mandatory If you want to maintain control:

First, the complexity has skyrocketedA container can live for minutes or seconds, a microservice can change versions several times a day, and the components multiply. What was once a monolithic application becomes a constellation of interconnected services. Operations teams find themselves dealing with hundreds or thousands of constantly changing entities, many of which they didn't develop themselves.

In addition to this clear increase in riskDeploying multiple times a day means continuously introducing changes—and potential rollbacks. Agile practices and continuous delivery add more tools, pipelines, and automations that also need to be considered. The ability to quickly detect a problem, identify the root cause, and revert or remedy it in a matter of minutes is no longer desirable but a requirement.

In parallel, a skills gapThe technology stack is so vast that it's impossible for a single person to master databases, networks, APIs, security, containers, orchestration platforms, and CI/CD tools. Mechanisms are needed to help understand how everything fits together, what depends on what, and where to look when something goes wrong. Without this connected view, the time wasted jumping between tools can be enormous.

And, to top it all off, problems arise with “tool sprawl” or excess of toolsEach layer of the stack typically has its own monitoring solution: one for the database, another for the infrastructure, another for the front end, another for logs, another for traces… Correlating data between them involves continuous context switching, manual searches, and longer incident resolution times. This is the exact opposite of what's needed when the application is down and users are complaining.

The answer to all this lies in a unified observability platform that collects all relevant telemetry, connects it to the entities that generate it, and allows any team—development, operations, security, business—to explore and leverage that data from a single location. This includes not only performance metrics but also business events and signals that reveal the economic impact of each incident.

OpenTelemetry as a common language of observability

One of the clearest trends is the consolidation of OpenTelemetry (OTel) as an open telemetry standardIt is an open-source framework that defines APIs, SDKs, and components to collect metrics, logs, and traces in a homogeneous way, without being tied to a specific observability tool manufacturer.

In the coming years, it is expected that Companies demand compatibility with OpenTelemetry to its vendors. The reason is simple: by using a “universal language” to describe telemetry, an organization can switch observability platforms without having to rewrite or re-instrument all of its code. This reduces the risk of vendor lock-in and provides the flexibility to evolve the stack as needed.

In contrast to fully proprietary solutions, where each new integration depends on the manufacturer's roadmap, OTel It allows integrations to survive technological changes.As new cloud services, frameworks, or runtimes emerge, they simply need to emit telemetry in the standard format to be able to send it to any compatible backend.

Furthermore, the use of OpenTelemetry is key to properly feed Artificial IntelligenceAI models, whether traditional machine learning, anomaly detection, or generative AI, work best when the data is clean, structured, and consistent. OTel provides precisely that uniform framework for generating and labeling the telemetry that the algorithms will then process.

Recent studies suggest that organizations that already use OpenTelemetryEven if only partially implemented, they perceive a positive impact on indicators such as revenue growth, improved operating margins, and brand reputation. It's not magic: having a consistent and portable observability base makes it easier to detect problems before they affect the customer and optimize the performance of key services.

The three pillars of a modern observability practice

Beyond adopting a standard like OTel, a sound observability practice relies on three basic components that reinforce each other: open instrumentation, connected entities (or data), and programmability.

La open instrumentation This involves collecting telemetry from both proprietary and open-source agents. Applications, services, hosts, containers, serverless functions, mobile apps, managed cloud services—everything must be able to emit metrics, events, logs, and traces in formats that can be standardized. This is where agents from traditional vendors come into play, but also exporters and libraries from OpenTelemetry and other open-source projects.

The second block is that of the connected entities and metadataSimply accumulating metrics and logs isn't enough; you need to understand who generates them and how they relate to each other. This requires identifying services, databases, queues, functions, pods, clusters, cloud accounts, and linking their telemetry and dependencies. With this context, the platform can automatically render architecture maps, call flows, and incident timelines without the team having to configure everything manually.

Based on that, one can apply intelligence and advanced analyticsBy identifying patterns, anomalies, and correlations within the dataset, observability platforms can help prioritize alerts, reduce noise, detect complex incidents, and accelerate root cause analysis. This is the natural path toward increasingly proactive observability and, as we will see later, toward agentic autonomy.

Finally there is the programmabilityEvery business has specific needs: its own KPIs, different critical processes, and unique cost models. A modern observability platform must allow for building custom applications and views on top of all the telemetry: dashboards that blend technical data with business metrics, economic impact analysis of outages or degradations, or internal applications to investigate complex incidents according to the company's workflow.

This ability to "program" on observability data opens the door to use cases such as quantify the real cost of a mistake In a payment process, relate it to the technical cause (for example, a regression in a checkout microservice) and thus prioritize correction efforts with purely economic impact criteria.

Business-oriented observability: from console to outcome

One of the major transformations anticipated is the shift from one observability focused on the technical operation to another clearly business-oriented one. The same data—logs, traces, metrics, events—begins to be used not only to maintain the infrastructure, but also to answer key questions about revenue, costs, and user experience.

In industrial sectors, for example, the observability of IoT sensors allows anticipate machinery failures and optimize maintenance plans. If abnormal vibration patterns or out-of-range temperatures are detected, intervention can be scheduled before the production line stops, preventing unplanned downtime and its economic consequences.

In the financial sector, analyzing in real time the transaction logs It helps identify suspicious transactions that could be related to fraud. When the system detects atypical event sequences, unusual geolocations, or amounts that break with usual patterns, it can trigger automatic blocking mechanisms or manual review before an attack is successful.

In marketing and sales, correlating the application traces with campaign metrics It allows you to answer very direct questions: Is website latency affecting click-through rate or conversion? Which version of a feature best improves navigation and dwell time? If performance drops during a campaign, observability helps identify how many potential sales have been lost and at what exact point in the funnel the problem occurred.

All of this involves translating technical telemetry into actionable knowledge for business leadersIt's not about showing a sales director a CPU graph, but about showing them how many transactions failed to complete due to service degradation and what the estimated cost was. And to achieve this, observability must link technical data, user events, and business metrics within the same model.

Consultancies specializing in observability, such as Nettaro, are already helping companies and institutions to to make this leap from a purely operational vision to a strategic visiondesigning models that connect business KPIs with real-time telemetry signals.

From AIOps to Agent Observability

The adoption of Artificial Intelligence in observability platforms It's already a reality. Most ITOps teams have incorporated AIOps components—algorithms that analyze large volumes of operational data to detect anomalies, group events, or predict problems—into their workflows.

In many cases, it is also being integrated Generative AI to interact with telemetry using natural language: ask conversational questions like "why did 500 errors increase in Europe 20 minutes ago?" and get an explanation based on logs, metrics, and traces without having to build complex queries.

However, today most decisions are based on AI They continue to be reviewed by peopleAlgorithms help filter out noise and identify potential causes, but operations teams maintain control, validate recommendations, and manually execute many remediation actions. Complete trust in automated decisions is still limited.

This is where the Agent ObservabilityThis is an approach in which AI agents assume a much more autonomous role: they not only detect patterns and explain what is happening, but also They manage complete workflows, from identifying the fault to implementing the appropriate solution.

In this model, an agent can, for example, detect an anomalous increase in the latency of a critical service, correlate it with a specific deployment, check the history of similar incidents, and decide for itself whether launch a rollback, scale capacity, or apply an alternative configurationAll of this is recorded in detail for auditing and potential subsequent human review.

Currently, only a minority of companies use this Active Agent Observabilitywith automated remediation and advanced problem prediction. But forecasts indicate that its adoption will grow significantly, driven by the search for greater productivity in IT teams and the need to reduce the time they spend on repetitive maintenance tasks.

Limitations of manual supervision and the need for autonomy

The demand for self-employed agents is better understood if we look at extreme cases such as the large language model observability (LLM)Manually monitoring these types of systems is a near-impossible task: the data volumes are gigantic, the architectures combine multiple distributed components, and the need for real-time monitoring is constant.

The abundance of records and metrics makes it Identifying problems manually is very slowAny delay in detecting a change in behavior, an increase in errors, or a degradation in the quality of responses can have serious consequences in production environments, both in terms of user experience and reputation and regulatory compliance.

Furthermore, manual observation consumes many human resources; prone to errors and does not scale well As the number of models, instances, or integrations with business applications grows, what might work in a pilot with a few users becomes a bottleneck when the system is rolled out across the entire organization.

Therefore, in complex environments such as those involving LLM or highly distributed architectures, the need for autonomous observability solutionsWe are talking about systems capable of continuously analyzing telemetry, detecting deviations, proposing or executing corrective actions, and learning from each intervention to improve their effectiveness over time.

Vision-action agents and automation on interfaces

The advancement of AI is not limited to the realm of "classical" observability. Research by companies like NVIDIA, with projects such as Nitrogen It is driving models that combine vision and action capabilities: agents that observe a screen, infer the state of the environment and decide what to do next, without specific integrations with the system they are controlling.

Technically, this involves training a model with large corpora of videos of games or interactions so that they learn to relate what they see to the actions an expert would take. They work on time sequences, motion discretization, long-term goals, and optimization under multiple constraints such as latency or stability.

Although the most visible example is gaming, this vision-action approach has enormous potential in business: it allows for the creation of agents that operate on graphical interfaces conventional, navigating complex applications, running repetitive flows, validating processes, or performing end-to-end tests without the need for specific APIs.

This represents a kind of natural evolution of traditional RPA towards a Smarter, more contextual automationTypical use cases include automated software testing that simulates real user behavior, guided support that replicates click-by-click what an employee should do, synthetic data generation for QA, or "digital twins" that replicate human activity in corporate systems.

For all of this to be viable, a robust framework for cybersecurity, governance, and observabilityAgents interacting with critical interfaces and systems must adhere to access policies, avoid dangerous actions, log every step for auditing purposes, and operate within clearly defined boundaries. Observability here acts as both a "black box" and a "toolbox": it records what the agent does and provides data to calibrate and improve its behavior.

Security, governance, and Zero Trust in the era of AI agents

The expansion of agentic AI and autonomous systems brings with it New risks that must be managed carefullyOne of the most discussed is the so-called "shadow AI": agents, models or integrations that are launched outside the organization's official channels, without adequate security or regulatory compliance controls.

There is also the danger of double agents or malicious agentsThis can occur either by design (external attacks, prompt manipulation, instruction injection) or due to configuration errors that allow a well-intentioned system to perform unintended actions. To minimize these risks, it is important to apply principles of Zero Trust specifically regarding Artificial Intelligence.

Zero Trust in this context means that No AI agent or component is considered "reliable" by default.Every action must be explicitly authorized, permissions must be limited to the minimum necessary (principle of least privilege), and all interactions must be logged for later auditing. Observability thus becomes a key element of AI governance.

Having good observability allows for real-time monitoring of what agents are doing, detection of anomalous behavior, validation of access policies, and the availability of complete evidence in case of incidents. Tools such as lists of permitted actions, human reviews of critical loops, sanitization of sensitive data, and controls over the location of computing (on-premises, public cloud, sovereign cloud) are essential elements of a robust checklist. effective AI governance.

In this scenario, it is vital to find the balance between innovation and controlOrganizations want to fully exploit the potential of agentic AI to gain productivity and competitiveness, but without sacrificing security, regulatory compliance, or transparency in automated decision-making.

Data, infrastructure, and AI as the foundational layer of the business

Looking at the big picture, AI is evolving from an additional tool to becoming a structural layer on which economic competitiveness is basedEverything revolves around that transformation: data strategies, cloud architecture, hardware design, workforce models, and even national policies on digital infrastructure.

On the one hand, Data is consolidated as the main competitive differentiatorAs computing and modeling become more commoditized, what makes the difference is having your own high-quality, well-governed data. Observability, by capturing rich and contextual telemetry, becomes one of the most valuable sources of data for power AI systems and improve processes.

On the other, the AI infrastructure is beginning to be seen as a strategic national assetThe rise of sovereign clouds responds to the need to control where sensitive data is stored and processed, how models are trained, and under what regulatory frameworks they operate. Countries are investing in data centers optimized for AI workloads, energy-efficient, and aligned with compliance requirements.

All of this coincides with a accelerated modernization of data centersPressed by the energy and cooling demands of AI workloads and agent systems, energy efficiency is no longer simply an operational issue but has become a limiting factor for innovation and an environmental compliance requirement.

In parallel, companies are forced to retrain its workforceThe goal is not to turn everyone into a programmer, but to train professionals capable of orchestrating and leveraging these autonomous systems: AI-powered business experts, engineers who can translate operational needs into observability and security policies, and hybrid roles that understand both the technical and economic impact of decisions.

Taken together, this evolution leads to a scenario in which the more open and autonomous observability It becomes the glue that links technology, business and regulation: standards like OpenTelemetry guarantee data portability and quality, AI and Agent Observability reduce operational complexity and accelerate incident response, and governance and Zero Trust practices ensure that all of this happens under control, securely and with real auditability.

Organizations that manage to articulate this combination – standardized telemetry, unified platforms, a focus on business results, and AI agents governed with good observability – will be best positioned to compete in an environment where digital systems are increasingly critical, complex, and autonomous, but also more capable of generating tangible value when managed with the right visibility.

architecture of an AI factory
Related article:
Architecture of an AI factory: keys to building it well