datadog_skill

This skill helps you implement end-to-end Datadog observability for production systems, enabling tracing, logs, metrics, and alerting.

Python

13

GitHub Stars

2

Bundled Files

2 months ago

Catalog Refreshed

4 months ago

First Indexed

Readme & install

Copy the install command, review bundled files from the catalogue, and read any extended description pulled from the listing source.

Installation

Preview and clipboard use veilstrat where the catalogue uses aiagentskills.

npx veilstrat add skill bobmatnyc/claude-mpm-skills --skill datadog

metadata.json441 B
SKILL.md6.5 KB

Overview

This skill provides practical guidance for implementing full-stack observability with Datadog, covering APM, logs, metrics, synthetics, and RUM. It focuses on deploying the Datadog Agent, instrumenting applications, linking traces/logs/metrics, and practical cost-control techniques. Use it to get production monitoring, tracing, alerting, and cost optimization working quickly and safely.

How this skill works

The skill walks through installing the Datadog Agent (Docker, Kubernetes/Helm), enabling automatic instrumentation, and adding manual spans and custom metrics. It explains how Datadog correlates traces, logs, and metrics via tags and trace IDs so you can pivot between data sources in the UI. It also highlights tagging strategy, sampling, and log index management to control cardinality and cost.

When to use it

Setting up production monitoring across infrastructure and application stacks
Implementing distributed tracing across microservices to diagnose latency
Configuring centralized log aggregation, parsing pipelines, and retention
Creating custom metrics, dashboards, and alerting policies
Optimizing telemetry ingestion and costs via sampling and index quotas

Best practices

Start simple: install agent, enable auto-instrumentation, verify data in UI
Use consistent, low-cardinality tags (env, service, version, team) for correlation
Avoid user IDs, request IDs, and pod IDs as tags to prevent metric explosion
Progress incrementally: basic monitoring → APM → custom metrics → profiling → RUM
Set sensible sampling and log index quotas to control ingestion costs
Alert on meaningful symptoms and thresholds to reduce alert fatigue

Example use cases

Instrumenting a Python API with ddtrace to surface slow database calls and external latency
Deploying Datadog Agent via Helm in Kubernetes with APM and logs enabled for a cluster
Creating dashboards that correlate frontend RUM metrics with backend traces for performance regressions
Implementing log pipelines and parsers to extract structured fields and reduce query costs
Applying sampling and Metrics without Limits patterns to keep telemetry costs predictable

FAQ

Include env, service, version, and team consistently across telemetry for filtering, aggregation, and cost attribution.

How do I avoid runaway billing from logs and traces?

Apply log index quotas, use sampling for high-volume traces, and remove high-cardinality tags from metric/tagging.