Structured Logging and OpenTelemetry for Modern Hosting Stacks
Unstructured Logs Are Just Expensive Noise at Scale
When your hosting stack is a single application on a single server, reading log files works. When your stack grows to multiple services, multiple servers, background workers, databases, and caches — all handling concurrent requests — plain-text log files become a labyrinth. Searching for a specific request's journey across five services means grepping through five different log files with five different formats, hoping the timestamps align closely enough to correlate events. This does not scale.
Structured logging and OpenTelemetry solve this by transforming logs from free-text diaries into queryable, correlated data — and by connecting logs with metrics and traces into a unified observability system. This guide covers the practical implementation: structured JSON logging, OpenTelemetry instrumentation, trace context propagation, and building dashboards that answer operational questions across your entire hosting stack.
Structured Logging: What and Why
From Text to Data
An unstructured log line looks like this: 2026-01-15 14:32:01 ERROR Failed to process order 4521 for user john@example.com - connection timeout after 5000ms. A human can read it. A machine cannot reliably parse it. The timestamp format, the log level, the order ID, the user, the error type, and the duration are all embedded in free text with no consistent structure.
A structured log line for the same event is a JSON object with explicit fields: timestamp, level, message, orderId, userId, errorType, durationMs, service, traceId. Every field is individually queryable. You can search for all errors with durationMs > 3000, filter by userId, aggregate error counts by errorType, and correlate with other services by traceId — without writing fragile regex patterns.
Key Fields to Include
- timestamp: ISO 8601 format with timezone. Consistent timestamps are essential for correlating events across services.
- level: debug, info, warn, error, fatal. Machine-readable severity enables filtering and alerting.
- message: A human-readable description of the event. Keep it concise and consistent for the same event type.
- service: Which service emitted the log. Critical for multi-service architectures.
- traceId and spanId: OpenTelemetry context identifiers that link this log entry to a distributed trace. This is the key to correlating logs across services.
- Contextual fields: Request ID, user ID, session ID, HTTP method and path, response status code, duration — any field that helps with diagnosis and querying.
OpenTelemetry: The Observability Standard
OpenTelemetry (OTel) is a vendor-neutral framework for collecting and exporting telemetry data — traces, metrics, and logs — from your applications. It provides SDKs for every major language, automatic instrumentation for popular frameworks (Express, NestJS, Django, Spring), and a standardised data format that works with any observability backend (Grafana, Datadog, Jaeger, Elasticsearch).
Why Vendor-Neutral Matters
Instrumenting your code with a vendor-specific SDK locks you into that vendor. Switching observability platforms means re-instrumenting your entire application. OpenTelemetry instruments your code once, and you route the telemetry data to any backend — or multiple backends simultaneously. Switching vendors means changing a configuration, not rewriting instrumentation.
The Three Pillars
- Traces: Follow a single request as it travels across services. A trace shows the full journey — the API gateway, the application service, the database query, the cache lookup, and the response — with timing for each step. When a request is slow, the trace shows exactly where the time was spent.
- Metrics: Numerical measurements over time — request count, error rate, response latency percentiles, CPU usage, queue depth. Metrics tell you about system behaviour in aggregate: is the error rate increasing? Is latency degrading?
- Logs: Detailed event records. With trace context attached, logs are linked to the specific request they belong to. Instead of searching all logs for error messages, you find the trace for the problematic request and see its logs in context.
Trace Context Propagation
The magic of distributed tracing is context propagation. When Service A calls Service B, it passes the trace ID and span ID in HTTP headers (the W3C Trace Context standard). Service B's logs and spans are linked to the same trace. Every service in the request chain inherits the context, and the entire journey is reconstructable from a single trace ID.
OpenTelemetry SDKs handle propagation automatically for HTTP calls, gRPC calls, and message queue consumers. For custom communication channels (WebSockets, raw TCP, batch processing), you may need to propagate context manually — but the pattern is simple: inject context into the outgoing message, extract it on the receiving end.
Implementing OpenTelemetry in a Hosting Stack
Step 1: Add the SDK
Install the OpenTelemetry SDK for your language. For Node.js applications (common in hosting platforms), the @opentelemetry/sdk-node package provides automatic instrumentation for HTTP, Express, NestJS, database clients, and more. A single initialisation file enables tracing across your entire application.
Step 2: Configure Auto-Instrumentation
Auto-instrumentation detects the libraries your application uses and adds tracing automatically. HTTP requests, database queries, Redis commands, and message queue operations are traced without code changes. The result is immediate visibility into the performance characteristics of your application.
Step 3: Add Custom Spans
Auto-instrumentation covers framework and library operations. For application-specific logic (business rule processing, complex calculations, external API calls to services without auto-instrumentation), add custom spans that measure the duration and capture relevant attributes.
Step 4: Connect Logs to Traces
Configure your logging library to include the OpenTelemetry trace ID and span ID in every log entry. When you view a trace in your observability dashboard, the associated logs appear in context. When you find an error in your logs, the trace ID links you to the full request journey.
Step 5: Export to Your Backend
Configure the OpenTelemetry SDK to export telemetry data using the OTLP (OpenTelemetry Protocol) exporter. Point it at your observability backend — the OpenTelemetry Collector (a standalone proxy that receives, processes, and routes telemetry), or directly at your backend's OTLP endpoint. The Collector adds flexibility: you can filter, sample, and route telemetry data before it reaches your backend, reducing storage costs and noise.
Building Useful Dashboards
Telemetry data is only valuable if it answers operational questions. Design dashboards that address these needs:
- Service health overview: Request rate, error rate, and latency percentiles (p50, p95, p99) for each service. The RED method (Rate, Errors, Duration) provides immediate visibility into service health.
- Request tracing: Ability to search for a specific trace by ID, by user, or by endpoint, and view the full journey with timing for each span.
- Error investigation: Aggregated error logs by type, service, and endpoint, with links to the traces that contain those errors.
- Dependency map: A visual representation of which services communicate with which, derived from trace data. This reveals unexpected dependencies and highlights which services are on the critical path.
Practical Tips
- Sample traces in production: Tracing every request generates significant data volume. Sample a percentage (1-10%) for normal operations and increase sampling for errors or slow requests (tail-based sampling). This controls costs while ensuring you capture the traces that matter.
- Standardise log formats across services: All services should use the same structured log schema. Consistent field names enable cross-service querying without per-service query adjustments.
- Use semantic conventions: OpenTelemetry defines semantic conventions for common attributes (HTTP status codes, database types, messaging systems). Following these conventions ensures your telemetry data is compatible with standard dashboards and analysis tools.
- Start with auto-instrumentation: Get immediate value by enabling auto-instrumentation across all services. Add custom spans and enriched logging incrementally as you identify gaps.
The Bottom Line
Structured logging and OpenTelemetry transform observability from a collection of disconnected tools into a unified system where logs, metrics, and traces are correlated by request context. When a user reports a slow page, you find the trace, see which service was slow, read the logs for that specific request, and identify the root cause — in minutes, not hours. Instrument once with OpenTelemetry, export to any backend, and build dashboards that answer the questions your on-call team actually asks at two in the morning.