observability_skill

This skill helps you implement end-to-end observability using OpenTelemetry, Jaeger, and tracing to map services, collect logs and metrics across microservices.

Shell

2

GitHub Stars

1

Bundled Files

2 months ago

Catalog Refreshed

4 months ago

First Indexed

Readme & install

Copy the install command, review bundled files from the catalogue, and read any extended description pulled from the listing source.

Installation

Preview and clipboard use veilstrat where the catalogue uses aiagentskills.

npx veilstrat add skill pluginagentmarketplace/custom-plugin-devops --skill observability

SKILL.md760 B

Overview

This skill provides distributed tracing and observability patterns for microservices using Jaeger and OpenTelemetry. It helps teams capture logs, metrics, and traces to understand system behavior and dependencies. The focus is practical instrumentation, trace context propagation, and service dependency mapping to accelerate debugging and performance tuning.

How this skill works

Instrument services with OpenTelemetry libraries to emit traces, metrics, and logs. Traces are exported to Jaeger (with optional Zipkin or third-party APM integrations) and stitched across services via propagated trace context. Collected data is used to build dependency maps, visualize request flows, and identify latency or error hotspots.

When to use it

Deploying or debugging microservices with hard-to-reproduce latency or errors
Establishing end-to-end observability during CI/CD and progressive rollout
Monitoring service-to-service dependencies and request flow
Validating SLOs and investigating error budget consumption
Integrating observability into developer workflows for faster root-cause analysis

Best practices

Instrument critical paths first, then expand to broader coverage
Propagate trace context across all RPCs, queues, and async boundaries
Apply sensible sampling strategies to control telemetry volume
Correlate logs, metrics, and traces via consistent IDs and tags
Use service maps and latency histograms to prioritize optimization work

Example use cases

Trace a user request across frontend, auth, and backend to find the slow service
Measure tail latency during canary deployments to decide rollback thresholds
Map hidden service dependencies before refactoring or scaling
Integrate traces with alerting to reduce mean time to resolution for incidents
Experiment with custom instrumentation to capture business-specific context

FAQ

Collect logs, metrics, and traces as the three observability pillars; use OpenTelemetry for consistent instrumentation.

Can this work with commercial APMs?

Yes. While Jaeger is the primary tracer, the instrumentation supports export to Zipkin, New Relic, Datadog, or Honeycomb as optional backends.