What is OpenTelemetry?

The Problem

Modern distributed systems are complex, with hundreds of microservices making traditional monitoring insufficient. The industry was fragmented with competing standards.

The Solution

OpenTelemetry merged OpenTracing and OpenCensus in 2019, creating a single, unified standard for telemetry data collection.

The Benefit

Vendor neutrality, consistent instrumentation across languages, and freedom to choose any backend for analysis.

Monitoring vs Observability

Traditional Monitoring

  • Answers known questions
  • Predefined metrics and alerts
  • Works for predictable systems
  • "What is the current CPU usage?"

Modern Observability

  • Answers unknown questions
  • Infer system state from outputs
  • Essential for distributed systems
  • "Why are EU mobile users seeing payment delays?"

The Three Pillars of Observability

Distributed Traces

End-to-end narratives of requests as they flow through distributed systems. Each trace is composed of spans representing individual operations.

HTTP GET /api/users 245ms
SELECT FROM users 180ms
Database Query 155ms
Use Case: Deep contextual debugging, understanding request flow
Key Components: SpanID, TraceID, Timestamps, Attributes, Events

Metrics

Numerical measurements aggregated over time, forming time-series data. Efficient for dashboards, alerting, and trend analysis.

📈

Counter

Always increasing values (requests, errors, sales)

📊

Gauge

Point-in-time values (CPU usage, queue size)

📶

Histogram

Distribution of values (latency percentiles)

Use Case: Proactive monitoring, alerting, capacity planning
Benefits: Compact storage, fast queries, real-time dashboards

Logs

Timestamped records of discrete events. OpenTelemetry focuses on correlation with existing logging frameworks rather than replacing them.

2024-01-15 10:30:45 ERROR TraceID: 1a2b3c4d Payment processing failed: connection timeout
2024-01-15 10:30:44 INFO TraceID: 1a2b3c4d Initiating payment for user 12345
Use Case: Root cause analysis, detailed error context
Key Feature: Automatic correlation with traces via TraceID injection

The Power of Correlation

1

Alert Fires

Metric shows error rate spike

→
2

Examine Traces

Find failing request patterns

→
3

Check Logs

Get detailed error context

OpenTelemetry Architecture

API

Vendor-neutral interfaces for creating telemetry. Stable, minimal dependency that libraries can safely embed.

SDK

Concrete implementation of the API. Handles sampling, batching, and exporting telemetry data.

OTLP

Native wire protocol for efficient, high-fidelity transmission of telemetry data.

Collector

Vendor-agnostic proxy for receiving, processing, and exporting telemetry data.

Instrumentation Strategies

Advantages

  • Zero-code instrumentation
  • Quick deployment
  • Broad baseline coverage
  • Easy maintenance

Disadvantages

  • Limited customization
  • No business context
  • Language-dependent maturity
  • Higher performance overhead
Example: Java Auto-Instrumentation
java -javaagent:opentelemetry-javaagent.jar \
     -Dotel.service.name=my-service \
     -jar my-application.jar

Advantages

  • Complete control
  • Rich business context
  • Custom metrics
  • Optimized performance

Disadvantages

  • Development overhead
  • Maintenance burden
  • Requires expertise
  • Slower initial deployment
Example: Manual Span Creation
const tracer = opentelemetry.trace.getTracer('payment-service');

async function processPayment(userId, amount) {
    const span = tracer.startSpan('process_payment');
    span.setAttributes({
        'user.id': userId,
        'payment.amount': amount,
        'payment.currency': 'USD'
    });
    
    try {
        // Payment processing logic
        const result = await paymentGateway.charge(amount);
        span.setStatus({ code: SpanStatusCode.OK });
        return result;
    } catch (error) {
        span.setStatus({ 
            code: SpanStatusCode.ERROR,
            message: error.message 
        });
        throw error;
    } finally {
        span.end();
    }
}

Best Practice: Combine Both Approaches

1
Start with Auto-Instrumentation

Get immediate coverage of HTTP calls, database queries, and framework operations

2
Add Strategic Manual Instrumentation

Instrument critical business transactions and add valuable domain context

3
Continuous Refinement

Iteratively improve based on observability insights and business needs

OpenTelemetry Collector: The Central Nervous System

Receivers

Entry points for data (OTLP, Prometheus, etc.)

→

Processors

Transform data (batch, sample, enrich)

→

Exporters

Send to backends (Jaeger, Prometheus, etc.)

Deployment Patterns

Agent

Deployed alongside applications for local collection and basic processing

Gateway

Centralized service for advanced processing and multi-backend routing

The OpenTelemetry Ecosystem

Important: OpenTelemetry ≠ Complete Solution

OpenTelemetry is the plumbing for telemetry data - it generates, collects, and transports data. You still need a backend for storage, querying, and visualization.

Commercial Backend Support

Major Observability Platforms

Datadog New Relic Splunk Dynatrace Honeycomb Elastic

All provide native OTLP ingestion endpoints

Cloud Providers

AWS X-Ray Google Cloud Trace Azure Monitor

Integrated OpenTelemetry support in native services

Open-Source Alternative Stack

Visualization

Grafana

Traces

Jaeger Zipkin

Metrics

Prometheus

Logs

Elasticsearch OpenSearch

Strategic Decision: OpenTelemetry vs. Proprietary Agents

Pure OpenTelemetry

Advantages
  • Complete vendor independence
  • Unified standard across all services
  • Maximum control and customization
  • Future-proof investment

Proprietary Agents

Advantages
  • Out-of-the-box experience
  • Tightly integrated features
  • Mature enterprise support
  • Advanced APM capabilities

Project Maturity & Roadmap

Stable - Production Ready
Beta - Feature Complete
Development - In Progress

Core Specification Status

Tracing

Stable

Fully mature and production-ready. The most battle-tested component.

Metrics

API Stable SDK Mixed

Core functionality stable, advanced features still evolving.

Logs

Bridge Stable Native Evolving

Integration with existing frameworks stable, native logging pathway developing.

Language SDK Status Matrix

Language Traces Metrics Logs
Java Stable Stable Stable
C#/.NET Stable Stable Stable
Go Stable Stable Beta
Python Stable Stable Development
JavaScript Stable Stable Development
Rust Beta Beta Beta

Future Roadmap

Current Focus (P0-P1)

  • Logs General Availability
  • Stabilizing Semantic Conventions
  • SDK Performance & Robustness

New Signals (P2)

  • Continuous Profiling: CPU/memory profiles linked to traces
  • Real User Monitoring (RUM): Client-side instrumentation
  • eBPF Integration: Kernel-level observability

Long-term Vision

Complete end-to-end observability from user devices through backend services down to kernel-level performance - all unified under a single, open standard.

Getting Started with OpenTelemetry

5-Step Implementation Guide

1

Start with Auto-Instrumentation

Choose your language SDK and enable zero-code instrumentation for immediate visibility.

pip install opentelemetry-distro[otlp]
opentelemetry-bootstrap -a install
2

Configure Local Collector

Set up a collector with console exporter to inspect your telemetry data.

docker run -p 4317:4317 otel/opentelemetry-collector:latest
3

Add Manual Instrumentation

Instrument critical business transactions with custom spans and attributes.

span.set_attribute("user.id", user_id)
span.set_attribute("order.value", order_total)
4

Choose a Backend

Select a visualization platform - start with Jaeger for traces or your preferred vendor.

docker run -p 16686:16686 jaegertracing/all-in-one:latest
5

Scale with Production Setup

Deploy gateway collectors with sampling for cost-effective production observability.

probabilistic_sampler:
sampling_percentage: 1.0

Best Practices for Success

Establish Semantic Conventions

Define consistent attribute names for business concepts across all services (e.g., customer.id, tenant.name).

Implement Smart Sampling

Use tail-based sampling to retain 100% of errors while sampling successful traces for cost efficiency.

Configuration as Code

Manage collector configurations in version control with CI/CD deployment for reliability.

Start Small, Scale Gradually

Begin with one non-critical service, prove value, then incrementally roll out organization-wide.

The Inevitable Standard

OpenTelemetry has evolved from an emerging trend to the foundational standard for modern observability. Its vendor neutrality, comprehensive coverage, and strong community backing make it the inevitable choice for organizations building distributed systems.

Key Takeaways:

  • Vendor Independence: Instrument once, use anywhere
  • Unified Standard: Consistent approach across languages and services
  • Future-Proof: Backed by CNCF and industry leaders
  • Comprehensive: Growing from three pillars to full-stack observability

For any organization operating modern software, adopting OpenTelemetry is no longer a question of if, but when and how.