Observability
ZinTrust ships best-effort observability primitives that you can adopt incrementally:
- Request-scoped correlation via
RequestContext - Structured logging (with redaction)
- Health endpoints
- Prometheus metrics (
/metrics) - Optional OpenTelemetry tracing
The core idea is: observability should help you in production without becoming a production dependency.
Design principles
Observability must not break requests
ZinTrust treats metrics/tracing as “nice to have”. If a metrics client or tracing export is misconfigured, requests should still succeed.
Correlate using IDs
ZinTrust maintains a request-scoped context that includes:
requestId(fromx-request-idor generated)traceId(from W3Ctraceparentwhen present, otherwise fromx-trace-id)- optional
userIdandtenantId(set by auth middleware)
This enables consistent correlation across logs and traces.
See: request-context.
Metrics should be low-cardinality
Metrics are for aggregates and dashboards.
- Use stable route templates (e.g.
/api/users/:id) as theroutelabel - Avoid raw paths (
/api/users/123) and query strings as labels
If you need per-request details, use logs/tracing instead.
Logs
Logging is the baseline observability layer.
Recommended patterns:
- Emit a request completion log (method, route, status, duration)
- Include
requestIdalways - Include
traceIdwhen present - Consider including
tenantIdif it’s non-sensitive and useful
Avoid:
- Secrets, tokens, passwords
- High-volume logs in tight loops
See: log-correlation.
Health checks
Health endpoints allow orchestration systems to understand process state.
Typical usage:
- Liveness: “is the process up?”
- Readiness: “can it serve traffic?” (e.g. DB reachable)
See: health-checks.
Metrics (Prometheus)
ZinTrust exposes Prometheus-compatible metrics when enabled.
Highlights:
- Enable with
METRICS_ENABLED=true - Optional path override via
METRICS_PATH(default/metrics) - HTTP counters/histograms and DB counters/histograms
See: metrics.
Tracing (OpenTelemetry)
Tracing is optional and guarded by configuration (e.g. OTEL_ENABLED=true).
ZinTrust’s approach is intentionally conservative:
- Uses OpenTelemetry API patterns
- Avoids hard coupling to an SDK/exporter
- Creates request spans and (when applicable) DB spans
- Enriches spans with
requestId,tenantId, anduserIdwhen available
See: tracing.
Putting it together
Recommended progression:
- Turn on structured logs + request correlation (
x-request-id) - Add health endpoints for orchestration
- Enable Prometheus metrics + dashboards/alerts
- Add OpenTelemetry tracing when you need per-request performance diagnostics
Troubleshooting
Trace IDs don’t appear
- Confirm your edge forwards
traceparent - Ensure tracing is enabled (
OTEL_ENABLED=true) and logs includetraceId - Validate incoming
traceparentis correctly formatted
Prometheus memory spikes
Most common cause:
- High-cardinality labels (using raw paths instead of templates)
Fix:
- Ensure the
routelabel uses a route template (notreq.path).