Reliability Engineering

Observability: The Three Pillars and What Comes Next

Metrics, logs, and traces are table stakes. The real shift is toward correlated, vendor-neutral, high-cardinality data you can actually query.

UVExcel Tech25 Jan 20269 min read

The 'three pillars' framing — metrics, logs, and traces — has done a lot of good and is now slightly in the way. It is good because it names the distinct kinds of telemetry a system emits. It is in the way because it encourages teams to treat them as three separate tools with three separate bills, when the value of observability comes almost entirely from connecting them. Knowing that something failed (a log), how often (a metric), and along which exact request path (a trace) is far more than the sum of the three viewed in isolation.

What each signal is actually for

Metrics are cheap, aggregatable numbers over time — the right tool for dashboards, trends, and alerting on the golden signals of latency, traffic, errors, and saturation. Logs are timestamped records of discrete events, indispensable for the detail of what happened but expensive and noisy at volume. Traces follow a single request as it propagates across services, which is the only practical way to understand latency and failure in a distributed system where no single service has the whole story. Each answers a different question; none answers all of them.

Correlation is the whole game

The shift that matters is from collecting these signals separately to connecting them. A log entry on its own tells you something broke; the same log linked to the trace that produced it shows you the exact request and execution path, and the metrics around that request tell you whether the problem is isolated or systemic. This correlation is what collapses mean-time-to-resolution. It depends on shared context — consistent identifiers like trace and span IDs threaded through every signal, and consistent naming so attributes mean the same thing across services. Without that discipline, you have three data silos and a lot of manual cross-referencing during an incident.

AI-assisted operations cannot correlate what isn't consistently named and connected. The payoff from any 'AIOps' layer is bounded by the quality and consistency of the telemetry beneath it.

OpenTelemetry and the end of lock-in

The most consequential development in observability is OpenTelemetry — a vendor-neutral, open-source standard for instrumenting, collecting, and exporting telemetry. Its premise is simple and powerful: you instrument your code and infrastructure once, against a single set of APIs and semantic conventions, and you can send that data to any compatible backend without re-instrumenting. The data you generate is yours, and switching analysis vendors stops meaning re-wiring every service. With broad industry support and SDKs across the major languages, OpenTelemetry has effectively become the default substrate for new observability work, with the Collector acting as the vendor-agnostic pipeline that receives, processes, and routes everything.

High cardinality and the fourth signal

Two frontiers define what comes next. The first is high-cardinality data — the ability to slice telemetry by attributes with enormous numbers of distinct values, like user ID, request ID, or build SHA. Traditional metrics systems buckle under cardinality; modern observability treats wide, richly-attributed events as first-class so you can ask questions you did not anticipate when you instrumented, which is the real definition of an observable system. The second frontier is continuous profiling, increasingly treated as a fourth signal alongside metrics, logs, and traces. Profiling answers the question the other three cannot: not just which request was slow, but which lines of code and which resource consumption made it slow.

Where to start

1Decide what questions you need to answer — your SLIs and golden signals — before deciding how much data to collect.
2Instrument with OpenTelemetry so your telemetry is portable and consistently named from day one.
3Thread trace context through every service so logs, metrics, and traces can be correlated automatically.
4Invest in high-cardinality, wide events for the systems where you most need to ask unanticipated questions.
5Add continuous profiling where latency or cost problems resist explanation from the other three signals.

The three pillars are still the foundation; they are simply no longer the destination. The teams getting the most from observability are the ones who stopped buying three disconnected tools and started building one correlated, vendor-neutral, query-first picture of their systems — and who treat the goal as reducing time-to-understanding, not accumulating data.

Key takeaways

The value is in correlating metrics, logs, and traces — not collecting them as three silos.
Instrument with OpenTelemetry to make telemetry portable, consistent, and free of vendor lock-in.
High-cardinality wide events let you ask questions you didn't anticipate — the real test of observability.
Continuous profiling is emerging as a fourth signal for problems the other three can't explain.

Related insights

Reliability Engineering

Designing SLOs That Survive Contact With Production

28 Apr 202610 min read

From reading to building

Want help putting these ideas into production?

We work alongside your team to architect, automate, and operate platforms that hold up under real load.

Book a Discovery Call