datadog_skill

This skill helps you implement end-to-end Datadog observability for production systems, enabling tracing, logs, metrics, and alerting.
  • Python

13

GitHub Stars

2

Bundled Files

3 weeks ago

Catalog Refreshed

2 months ago

First Indexed

Readme & install

Copy the install command, review bundled files from the catalogue, and read any extended description pulled from the listing source.

Installation

Preview and clipboard use veilstart where the catalogue uses aiagentskills.

npx veilstart add skill bobmatnyc/claude-mpm-skills --skill datadog

  • metadata.json441 B
  • SKILL.md6.5 KB

Overview

This skill provides practical guidance for implementing full-stack observability with Datadog, covering APM, logs, metrics, synthetics, and RUM. It focuses on deploying the Datadog Agent, instrumenting applications, linking traces/logs/metrics, and practical cost-control techniques. Use it to get production monitoring, tracing, alerting, and cost optimization working quickly and safely.

How this skill works

The skill walks through installing the Datadog Agent (Docker, Kubernetes/Helm), enabling automatic instrumentation, and adding manual spans and custom metrics. It explains how Datadog correlates traces, logs, and metrics via tags and trace IDs so you can pivot between data sources in the UI. It also highlights tagging strategy, sampling, and log index management to control cardinality and cost.

When to use it

  • Setting up production monitoring across infrastructure and application stacks
  • Implementing distributed tracing across microservices to diagnose latency
  • Configuring centralized log aggregation, parsing pipelines, and retention
  • Creating custom metrics, dashboards, and alerting policies
  • Optimizing telemetry ingestion and costs via sampling and index quotas

Best practices

  • Start simple: install agent, enable auto-instrumentation, verify data in UI
  • Use consistent, low-cardinality tags (env, service, version, team) for correlation
  • Avoid user IDs, request IDs, and pod IDs as tags to prevent metric explosion
  • Progress incrementally: basic monitoring → APM → custom metrics → profiling → RUM
  • Set sensible sampling and log index quotas to control ingestion costs
  • Alert on meaningful symptoms and thresholds to reduce alert fatigue

Example use cases

  • Instrumenting a Python API with ddtrace to surface slow database calls and external latency
  • Deploying Datadog Agent via Helm in Kubernetes with APM and logs enabled for a cluster
  • Creating dashboards that correlate frontend RUM metrics with backend traces for performance regressions
  • Implementing log pipelines and parsers to extract structured fields and reduce query costs
  • Applying sampling and Metrics without Limits patterns to keep telemetry costs predictable

FAQ

Include env, service, version, and team consistently across telemetry for filtering, aggregation, and cost attribution.

How do I avoid runaway billing from logs and traces?

Apply log index quotas, use sampling for high-volume traces, and remove high-cardinality tags from metric/tagging.

Built by
VeilStrat
AI signals for GTM teams
© 2026 VeilStrat. All rights reserved.All systems operational